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The design and implementation of precise static analyzers for significant fragments of modern 
imperative languages like C, C++, Java and Python is a challenging problem. In this paper, we 
consider a core imperative language that has several features found in mainstream languages such 
as those including recursive functions, run-time system and user-defined exceptions, and a realis- 
tic data and memory model. For this language we provide a concrete semantics — characterizing 
both finite and infinite computations — and a generic abstract semantics that we prove sound with 
respect to the concrete one. We say the abstract semantics is generic since it is designed to be 
completely parametric on the analysis domains: in particular, it provides support for relational 
domains (i.e., abstract domains that can capture the relationships between different data objects). 
We also sketch how the proposed methodology can be extended to accommodate a larger language 
that includes pointers, compound data objects and non-structured control flow mechanisms. The 
approach, which is based on structured, big-step G°°SOS operational semantics and on abstract 
interpretation, is modular in that the overall static analyzer is naturally partitioned into compo- 
nents with clearly identified responsibilities and interfaces, something that greatly simplifies both 
the proof of correctness and the implementation. 

Categories and Subject Descriptors: F3.1 [Logics and Meanings of Programs]: Specifying 
and Verifying and Reasoning about Programs. 

General Terms: Languages, Verification. 

Additional Key Words and Phrases: Abstract interpretation, structured operational semantics. 



1. INTRODUCTION 

The last few years have witnessed significant progress toward achieving the ideal of 
the program verification grand challenge [Hoa03]. Still, the distance separating us 
from that ideal can be measured by the substantial lack of available tools that are 
able to verify the absence of relevant classes of run-time errors in code written in 
(reasonably rich fragments of) mainstream imperative languages like C, C++, Java 
and Python. True: there is a handful of commercial products that target generic 
applications written in C, but little is known about them. In contrast, several 
papers explain the essence of the techniques employed by the ASTREE analyzer 
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to formally and automatically verify the absence of run-time errors in large safety- 
critical embedded control/command codes [BCC+02; BCC+03]; however, ASTREE 
is specially targeted at a particular class of programs and program properties, so 
that widening its scope of application is likely to require significant effort [Cou05] . It 
is interesting to observe that, among the dozens of software development tools that 
are freely available, there are hardly any that, by analyzing the program semantics, 
are able to certify the absence of important classes of run-time hazards such as, 
say, the widely known buffer overflows in C code. 

The reason for the current, extreme scarcity of the resource "precise analyzers 
for mainstream programming languages" is that the design and implementation of 
such analyzers is a very challenging problem. The theory of abstract interpretation 
[CC77a; CC92a] is crucial to the management of the complexity of this problem 
and, in fact, both ASTREE and the existing commercial analyzers are (as far as 
we know) based on it. Static analysis via abstract interpretation is conducted by 
mimicking the execution of the analyzed programs on an abstract domain. This 
is a set of computable representations of program properties equipped with all 
the operations required to mirror, in an approximate though correct way, the real, 
concrete executions of the program. Over the last decade, research and development 
on the abstract domains has led to the availability of several implementations of 
a wide range of abstract domains: from the most efficient though imprecise, to 
the most precise though inefficient. Simplification and acceleration techniques have 
also been developed to mitigate the effects of this complexity /precision trade-off. 
So the lack of semantics-based static analyzers is not ascribable to a shortage of 
abstract domains and their implementations. The point is that there is more to a 
working analyzer than a collection of abstract domains: 

(i) A concrete semantics must be selected for the analyzed language that models 
all the aspects of executions that are relevant to the properties of interest. This 
semantics must be recognizable as a sound characterization of the language 
at the intended level of abstraction. 

(ii) An abstract semantics must be selected and correlated to the concrete se- 
mantics. This requires a proof of correctness that, while greatly simplified 
by abstract interpretation theory, can be a time-consuming task by highly 
qualified individuals. 

(iii) An algorithm to finitely and efficiently compute (approximations of) the ab- 
stract semantics must be selected. 

(iv) For good results, the abstract domain needs to be an object that is both 
complex and easily adaptable. So, instead of designing a new domain from 
scratch, it is often better if one can be obtained by combining simpler, ex- 
isting, abstract domains. Even though the theory of abstract interpretation 
provides important conceptual instruments for the design of such a combina- 
tion, a significant effort is still needed to achieve, as far as possible, the desired 
precision and efficiency levels. Note that this point can have an impact on 
points (ii) and (iii): a generic abstract semantics has the advantage of not 
requiring an entirely new proof and a new algorithm each time the abstract 
domain changes. 
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This paper, which is the first product of a long-term research plan that is meant to 
deal with all of the points above, specifically addresses points (i) and (ii) and refers 
to a slight generalization of existing techniques for point (iii) . 

1.1 Contribution 

We build on ideas that have been around for quite some time but, as far as we 
know, have never been sufficiently elaborated to be applied to the description and 
analysis of realistic imperative languages. In extreme synthesis, the contribution 
consists in filling a good portion of the gaps that have impeded the application of 
these ideas to complex imperative programming languages such as C. 1 

More precisely, here we define the concrete and generic abstract semantics con- 
structions for a language — called CPM — that incorporates all the features of 
mainstream, single-threaded imperative programming languages that can be some- 
how problematic from the point of view of static analysis. Most notably, the CPM 
language features: a non-toy memory model; exceptions; run-time errors modeled 
via exceptions (for instance, an exception is raised whenever a division by zero 
is attempted, when a stack allocation request causes a stack overflow or when 
other memory errors occur); array types; pointer types to both data objects and 
functions; short-circuit evaluation of Boolean operators; user-defined (possibly re- 
cursive) functions; and non-structured control flow mechanisms. 

For the description of the concrete dynamic semantics of the language we have 
used a structured operational semantics (SOS) approach extended to deal with 
infinite computations, mainly building on the work of Kahn, Plotkin and Cousot. 
With respect to what can be found in the literature, we have added the treatment 
of all non-structured control flow mechanisms of the C language. Of course, as 
the ultimate goal of this research is to end up with practical analysis tools, the 
concrete dynamic semantics has been defined in order to facilitate as much as 
possible the subsequent abstraction phase. Still, our dynamic semantics retains all 
the traditional good features: in particular, the concrete rule schemata are plainly 
readable (assuming the reader becomes sufficiently familiar with the unavoidable 
notational conventions) and fairly concise. 

For the abstract semantics, we build on the work of Schmidt by providing the 
concrete dynamic semantics rules with abstract counterparts. As far as we know, 
this is the first time that Schmidt's proposal is applied to the analysis of a realistic 
programming language [D. Schmidt, personal communication, 2004]. A remarkable 
feature of our abstract semantics is that it is truly generic in that it fully supports 
relational abstract domains: the key step in this direction is the identification 
and specification of a suitable set of operators on (concrete and abstract) memory 
structures, that allow for domain-independent approximations but without inherent 
limitations on the obtainable precision. 



x It is worth noticing that we improperly refer to the C language to actually mean some more 
constrained language — like CIL, the C Intermediate Language described in [NMRW02] — where 
all ambiguities have been removed, in addition to an ABI {Application Binary Interface) that 
further defines its semantics. Similarly, by 'Python' wc mean a tractable subset of the language, 
such as the RPython subset being developed by the PyPy project (http://pypy.org/). 
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Schmidt's proposal about the abstract interpretation of natural semantics has, in 
our opinion, two important advantages: concrete and abstract rules can be made 
executable and are easily correlated. We review these two aspects in turn. 

Even though here we do not provide details in this respect, a prototype system 
— called ECLAIR 2 — has been developed in parallel with the writing of the present 
paper. The Prolog implementation exploits nice features of a semantics construc- 
tion based on SOS approach: the concrete semantics rule schemata can be directly 
translated into Prolog clauses; and the resulting interpreter, with the help of a 
C++ implementation of memory structures, is efficient enough to run non-trivial 
programs. Similar considerations apply to the modules implementing the abstract 
semantics: the abstract semantics rules are almost directly translated to generic 
Prolog code that is interfaced with specialized libraries implementing several ab- 
stract domains, including accurate ones such as the ones provided by the Parma 
Polyhedra Library [BHRZ05; BHZ05; BHZ06]. So, following this approach, the 
distance between the expression of the concrete semantics and its executable real- 
ization is, as is well known, very little; but the same can be said about the distance 
between the specification of the abstract semantics and the static analyzer that re- 
sults from its implementation. This prototype system therefore gives us confidence 
that both the concrete and abstract semantics are correctly modeled and that, in 
this paper, no real difficulties have been overlooked. 

For space reasons, only a subset of CPM is treated in full depth in the main 
body of the paper (the extension of the design to the full language is only briefly 
described even though all the important points are covered). For this subset, we 
give a complete proof of correctness that relates the abstract semantics to the 
concrete semantics. The proofs are not complicated and suggest (also because of 
the way we present them) the possibility of their automatization. To summarize, 
at this stage of the research work it does not seem unreasonable that we may 
end up with: readable and executable representations of the concrete semantics 
of mainstream programming languages; readable and executable representations of 
program analyzers; correctness of the analyzers established by automatic specialized 
theorem provers; and, at last, availability of sophisticated program analyzers for 
such languages. 

A final word is due to address the following concern: if the target languages are 
"real" imperative programming languages, why choose CPM, an unreal one? The 
reason is indeed quite simple: Java and Python miss some of the "hard" features of 
C; C misses exceptions; C++ is too hard, for the time being. So, choosing any one of 
these real languages would have been unlikely to provide us with the answer we were 
looking for, which was about the adequacy of Schmidt's approach with respect to 
the above goals. Moreover, in its ECLAIR realization, the CPM language is being 
extended so as to become a superset of C (i.e., with all the floating-point and integer 
types, cast and bitwise operators and so forth). Once that code has stabilized, a C 
and a Java subsystem will be forked. 



2 The 'Extended CLAIR' system targets the analysis of mainstream programming languages by 
building upon CLAIR, the 'Combined Language and Abstract Interpretation Resource', which 
was initially developed and used in a teaching context (see http://www.cs.unipr.it/clair/). 
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1.2 Related Work 

The literature on abstract interpretation proposes several frameworks for static 
analysis, where the more general approaches put forward in foundational papers 
are partially specialized according to a given criterion. For a few examples of spe- 
cializations based on the programming paradigm, one can mention the frameworks 
in [Bru91] and [GDL92] for the analysis of (constraint) logic programs; the approach 
in [CC94] for the analysis of functional programs; and the so called "Marktober- 
dorf '98 generic static analyzer" specified in [Cou99] for the analysis of imperative 
programs. 

All of these frameworks arc "generic" in that, while fixing some of the param- 
eters of the considered problem, they are still characterized by several degrees of 
freedom. It is therefore natural to reason on the similarities and differences be- 
tween these approaches. However, independently from the programming paradigm 
under analysis, direct comparisons between frameworks are extremely difficult in 
that each proposal typically focuses on the solution of a subset of the relevant is- 
sues, while partially disregarding other important problems. For instance, both 
[Bru91] and [GDL92] study the generic algebraic properties that allow for a clean 
and safe separation between the abstract domains and the abstract interpreter; in 
contrast, [Cou99] provides full details for a specific instance of the proposed frame- 
work, ranging from the parsing of literal constants to the explicit implementation 
of the abstract operators for the abstract domain of intervals. On the other hand, 
the frameworks mentioned above differ from the one presented in this paper in that 
they allow for significant simplifications of the language analyzed. Here we briefly 
discuss the main differences between the language considered in our proposal and 
the one in [Cou99] . 

At the syntactic level, as already mentioned, the language CPM is much richer 
than the simple imperative language adopted in [Cou99], which has no support 
for functions, nesting of block statements, exceptions, non-structured control flows 
and it allows for a single data type (in particular, no pointers and arrays). These 
syntactic differences are clearly mirrored at the semantics level. In particular, even 
though the detection of initialization and arithmetic errors is considered by the 
semantics in [Cou99], the actual process of error propagation is not modeled. In 
contrast, the semantics construction we propose can easily accommodate the so- 
phisticated exception propagation and handling mechanisms that can be found in 
modern languages such as C++, Java and Python. Note that this choice has a 
non-trivial impact on the specification of the other components of the semantic 
construction. For example, the short-circuit evaluation of Boolean expressions can- 
not be normalized as proposed in [Cou99], because such a normalization process, 
by influencing the order of evaluation of subexpressions, is unable to preserve the 
concrete semantics as far as exceptional computation paths are concerned. A mi- 
nor difference is in the modeling of integer variables and values: while [Cou99] 
considers the case of possibly uninitialized variables taking values in a finite set 
of machine-representable integers, for ease of presentation we have opted for defi- 
nitely initialized variables storing arbitrary (i.e., unbounded) integer values. Since 
the CPM language supports an extensible set of RTS exceptions, the specification 
of a semantics modeling (the generation, propagation and handling of) uninitial- 
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ization errors is rather straightforward. An extension of the semantics to the case 
of several sets of bounded and unbounded numerical types, with suitable type con- 
version functions, is under development. Another difference is in the generality 
of the abstract semantics construction: following the approach described here, an 
analyzer can take full advantage of the more accurate information provided by a 
relational domain such as that of polyhcdra. In contrast, the work in [Cou99] only 
considers the simpler case of non-relational abstract domains. As mentioned above, 
the semantics we propose also models the case of possibly recursive functions (with 
a call-by- value parameter passing mechanism) , which are not supported by the lan- 
guage syntax considered in [Cou99]. While both this paper and [Cou99] consider 
the specification of a forward static analysis framework, [Cou99] also provides a 
backward analysis for arithmetic expressions, to be used in reductive iterations so 
as to improve precision losses that are usually incurred by non-relational approxi- 
mations. 

1.3 Plan of the Paper 

The paper is organized as follows. Section 2 introduces the notation and termi- 
nology used throughout the paper; Section 3 defines the syntax of a subset of 
the imperative language CPM, whereas Section 4 defines its static semantics; the 
concrete dynamic semantics of this fragment is presented in Section 5, whereas 
its abstract counterpart is defined in Section 6. The proof of correctness of the 
abstract semantics is the subject of Section 7, while the computation of further ap- 
proximations is treated in Section 8. The integration of the full CPM language in 
the analysis framework presented in this paper is discussed in Section 9. Section 10 
concludes. 

2. PRELIMINARIES 

Let S and T be sets. The notation S C f T means that S is a finite subset of 
T. We write S W T to denote the union SUT, yet emphasizing the fact that 
S PiT = 0. The set of total (resp., partial) functions from S to T is denoted by 
S — > T (resp., S >— » T). We denote by dom(/) the domain of a function / : S — > T 
(resp., f:S^> T), where dom(/) = S (resp., dom(/) C S). Let (5, ■<) be a partial 
order and /: S — > S be a function. An element x G S such that x = f(x) (resp., 
x d? f{x)) is called a fix-point (resp., post-fixpoint) of /. The notation lfp x (/) (resp., 
gfp -<(/)) stands, if it exists, for the least (resp., greatest) fixpoint of /. A complete 
lattice is a partial order (S, such that lubT exists for each T C S. If /: S — > S 
is monotonic over the complete lattice S, the Knaster-Tarski theorem ensures that 
the set of post-fixpoints of / is itself a complete lattice. The fixpoint coinduction 
proof principle follows: if / is monotonic over the complete lattice S then, in order 
to prove that x d gfp^(/), it is sufficient to prove that x d f{x). 

Let S — {si, . . . , s n } be a finite set of cardinality n > 0. Then, the notation 
{si i— > ti, . . . ,s n i ► i„}, where {t\, . . . , t n } C T, stands for the function f:S^T 
such that f(si) = U, for each i = 1, . . . , n. Note that, assuming that the codomain 
T is clear from context, the empty set denotes the (nowhere defined) function 
/:0-fT. 

When denoting the application of a function /: (Si x • • • x S n ) — > T we omit, as 
customary, the outer parentheses and write /(si, . . . , s„) to mean /((si, . . . , s n )). 
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Let /o : So >— > To and f\ : S\ >— > Ti be partial functions. Then the function 
/o[/i] : (So U Si) >-» (T U Ti) is defined, for each a; G dom(/ ) U dom(/i), by 



(Note that, if /n and fi are total functions, then /n[/i] is total too.) 

For a partial function / : S >—> T and a set S' C 5, / | s , denotes the restriction 
of / to S", i.e., the function / \ s , : S' >— > T defined, for each ieS'n dom(/), by 
/ Ig'W = f( x )- (Note that, if / is a total function, then / \ s , is total too.) With 
a minor abuse of notation, we will sometimes write / \ S" to denote / \ S \ S n- 

S* denotes the set of all finite, possibly empty strings of symbols taken from S. 
The empty string is denoted by e. If w, z G S U S*, the concatenation of w and z is 
an element of S* denoted by wz or, to avoid ambiguities, by w ■ z. The length of a 
string z is denoted by \z\. 

The integer part function int : R — > Z is given, for each x G WL, by int (a;) = f [a; J , if 

x > 0, and int (a;) "=: f [a;], if a; < 0. The integer division and the modulo operations 
mod : (Z x Z \ {0}) — > Z arc defined, for each x,y E Z with y ^ 0, respectively 

by a; -f- y =' int(a;/y) and a; mod y d = x — (x y) • y. 

We assume familiarity with the field of program analysis and verification via 
abstract interpretation. The reader is referred to the literature for the theory 
(e.g., [Cou81; CC76; CC77a; CC79; CC92a; CC92c]) and examples of applications 
[DRS01; Hal93; SKS00]. 

3. THE LANGUAGE SYNTAX 

The run-time support of CPM uses exceptions to communicate run-time errors. 
The set of RTS exceptions is left open so that it can be extended if and when 
needed. That said, the basic syntactic sets of the CPM language are: 

Identifiers, id e Id = {main, x, xn, xi, . . .} ttl rid, where rid "= f {x, x ,x 1; . . .}; 
Basic types. T G Type = {integer, boolean}; 
Integers, m G Integer = f Z; 
Booleans. t G Bool =' {tt,ff}; 

RTS exceptions, x G RTS Except = {divbyzero, stkovf lw, memerror, . . .}. 

The identifiers in rid are "reserved" for the specification of the concrete semantics. 

From the basic sets, a number of syntactic categories are defined, along with their 
syntactic meta-variables, by means of the BNF rules: 

Expressions. 

Exp 9 e ::= m | -e | e + ei | e - ei | e * ei | e / ei | e % ei 




| i | e = ei | e 7^ ei | e < ei | e < ei | e > ei | e > ei 
| not e | e and ei | e or ei | id 
Sequences of expressions. 
Exps 9 cs ::= □ | e, es 



8 • R. Bagnara, P.M. Hill, A. Pescetti, and E. Zaffanella 

Storable types. 

sTypc 3 sT ::= T 
Formal parameters. 

formParams 3 fps 
Function bodies. 

Body 3 body ::= let al ins result e | extern : sT 
Global declarations. 

Glob 3 g ::= gvarid : sT = e | function id(fps) = body | recg \ go;gi 
Local declarations. 

Decl 3 d ::= nil | lvar id : sT = e | do; d\ 
Catchable types. 

cType 3 cT ::— rts.exception | sT 
Exception declarations. 

exceptDecl 3 p ::= x I C T | id : sT any 
Catch clauses. 

Catch 3 k ::~ (p) s | fc ; ki 
Statements. 

Stmt 3 s ::= nop | id := e id := id(es) | s ; si | d; s 
| if e then s else si | while e do s 
| throw x | throw e | try ,s catch k \ try s finally Si 

Observe that there is no need of a separate syntactic category for programs: as we 
will see, a CPM program is just a global declaration defining the special function 
'main', like in C and C++. 

It should be noted that some apparent limitations of the abstract syntax of CPM 
are not real limitations. For instance: the use of function calls as expressions 
can be avoided by introducing temporary variables; procedures can be rendered 
by functions that return a dummy value; and so forth. More generally, a slight 
elaboration of the abstract syntax presented here and extended in Section 9 is used 
in the ECLAIR prototype to encode the C language almost in its entirety, plus the 
basic exception handling mechanisms of C++ and Java. 

For notational convenience, we also define the syntactic categories of constants, 
storable values 3 and exceptions: 

Constants. 

Con 3 con ::= m I t 



::= □ | id : sT,fps 



3 The reason for a distinction between the roles of constants and storable values (as well as basic 
types and storable types) will become clear when discussing language extensions in Section 9. 
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Storable values. 

sVal 3 sval ::= con 
Exceptions. 

Except 9 £ ::= \ | sval 

The (partial) function type : sVal >— > sType, mapping a storable value to its type 
name 'integer' or 'boolean', is denned by: 



type(sval) = f 



{integer, if sval — m £ Integer; 
boolean, if sval = t£ Bool. 



For ease of notation, we also define the overloadings type: Except >— > cType and 
type : exceptDecl >— ► cType defined by 

ty P e(£) d 



type(p) = 



Note that such an overloading is consistent and the resulting function is not defined 
on value any £ exceptDecl. 

The helper function dom: cType — > {Integer, Bool, RTSExcept}, which asso- 
ciates a catchable type name to the corresponding domain, is defined by 



rts.exception, 


if£ = 


X £ RTSExcept; 


type (sval), 


if? = 


sval £ sVal; 


rts_exception, 


if p = 


X £ RTSExcept; 


cT, 


if p = 


cT e cType; 


sT, 


if p = 


id : sT and sT £ sType. 



dom(cT) d = 



Integer, if cT = integer; 

Bool, if cT = boolean; 

RTSExcept, if cT = rts.exception. 



4. STATIC SEMANTICS 

The static semantics of the CPM language establishes the conditions under which 
a program is well typed. Only well-typed programs are given a dynamic semantics. 

4.1 Defined and Free Identifiers 

The set of identifiers defined by sequences of formal parameters, (global or local) 
declarations or exception declarations is defined as follows: 

DI(D) d = Dl(nil) = f Dl(body) d = DI( X ) =' DI(cT) d = Dl(any) d = 0; 

DI(id : sT) d ^ f DI(gvarid : sT = e) d = DI(lvar id : sT = e) 

= f DI(function id(fps) = body) = f {id}; 

DI(id : sT, fps) = f DI(id : sT) U Dl(fps); 

DI(rec. g ) d ^ f DI( 5 ); 

DI( 30 ;.9i) d - DI( fl0 )UDI( 5l ); 
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DI(d ; di) d = DI(d ) U DI(di). 

The set of identifiers that occur freely in (sequences of) expressions, (exception) 
declarations, statements and catch clauses is defined by: 

FI(to) d = FI(t) Fl(nop) = f FI(D) d = FI(id : sT) = f Fl(nil) 

=' FI( X ) =' FI(cT) = f Fl(any) =' FI(throw x ) = FI(extern : sT) 0; 

FI(-e) = f FI(not e) d = FI(lvar id : sT = e) 

= f FI(gvar id : sT = e) d = FI(throw e) d = FI(e); 

FI(eo op ei ) d = FI(e ) U FI(ei), for op G {+,...,%,=,..., >, and, or}; 

FI(id) d ^' {id}; 

FI(let d in s result e) d = Fl{d) U (FI(s) \ Dl(d)) U (FI(e) \ DI(d)) ; 

FI(function id(fps) = body) d = Fl(body) \ Dl(fps); 

FI(rec.g) d ^FI(. 9 )\DI( 3 ); 

F% ;<?i) d = FI(ffo) U (FI( 3l ) \ DI(. 9o )); 

FI(do;di) d = FI(do) u (FI(di)\DI(do)); 

FI(id:=e) d = {id}UFI(e); 

FI(e,es) d = FI(e) UFI(cs); 

Fl(ido := id(es)) d = {id, id } U FI(es); 

FI(d; s) d = FI(d) U (FI(s) \ DI(d)); 

Fl((p) S ) d ^ f FI( S )\DI(p); 

FI(fc ;fci) d = FI(feo)UFI(A;i); 

FI(s ; «i) = f FI(try s Q finally Si) =' FI(s ) U FI( Sl ); 
FI(if e then s else si) d = FI(e) U FI(s ) U FI(si); 
Fl(whileedos) d = FI(e) U FI(s); 
FI(try s catch k) = FI(«) U FI(fc). 

4.2 Type Environments 

We start by defining the convenience syntactic category of 

Denotable types. 

dTypc 3 dT ::= sT loc | fps -> sT 

A type environment associates a denotable type to each identifier of a given, finite 
set of identifiers. 
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Definition 4.1. (TEnv/, TEnv.) For each I C f Id, the set of type environ- 
ments over I is TEnvi = f I — > dType; £/ie set 0/ a// type environments is given by 

def 

TEnv = l+J /CfId TEnv/. Type environments are denoted by (3, (3q, Pi and so forth. 
The notation (3 : I is a shorthand for (3 G TEnv/ . 

4.3 Static Semantics Predicates 

Let I C f Id and f3 e TEnv/. The well-typedness of program constructs whose 
free identifiers are contained in I is encoded by the following predicates, here listed 
along with their informal meaning: 



f3 hj e : sT, e is well-formed and has type sT in (3; 

(3 hj body : sT, body is well-formed and has type sT in (3; 

(3, fps hj es, es is compatible with fps and well formed in f3; 

fps : 5, fps is well formed and yields the type environment 6; 

(3 \-j g : 5, g is well formed and yields the type environment S in j3; 

(3 hj d : <5, d is well-formed and yields the type environment S in (3; 

\~i p : 5, p is well-formed and yields the type environment 5; 

(3 hi fc, fc is well-formed in /?; 

(3\~i s, s is well-formed in /?. 



These predicates are defined inductively on the abstract syntax by means of the 
following rules. 

Expressions. 



(3\~i m : integer (3\~i t : boolean 
(3 hi e : integer /3 hi e : boolean 



(3 hi — e : integer /3 hi not e : boolean 
/3 hi eo : integer /3 hi e\ : integer 



if @ €{ + ,-*,/,%} 

if si e {=,7^, <,<,>,>} 

if o G {and, or} 



(3 hi e B ei : integer 
/? hi eo : integer (3 hi ei : integer 

/3 hi e ffl e x : boolean 
/3 hi eo : boolean /3 hi ei : boolean 
/? hi e o ei : boolean 

if /?(id) = sT loc 

/3hiid:sT 

Sequences of expressions. 

[3 hi e : sT /?, fps hi es 
ADhiD /3, (id : sT,fps) hi (e,es) 
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Sequences of formal parameters. 

fps : 5 



□ : (id : sT, fps) : {id i-> sT loc} U 8 

Function bodies. 
P hj d : /?o /?[/?o] l-juDi(d) s /3[/3o] l~/uDi(d) e : sT 
/3 h/ (let din s result e) : sT 



if id g Dl(fps) 



(3 hj (extern : sT) : sT 
Declarations. 

(3 h/ e : sT 



/3 hj nil : /? hj gvarid : sT = e : {id sT loc} 

/3 h 7 e : sT 
/? hj lvar id : sT = e : {id i-> sT loc} 
fps : (5 0[8] l-/uDi(f ps ) body : sT 
(3 hj (function id(fps) = body) : {id ^> (fps -> sT)} 

if J = FI(ff) n Dl(flf) and Vid, sT : (id h-> sT loc) g <5 (1) 

/3 hj (rec #) : 5 

/3 I-/ 5o : A) /3[/3 ] r-juDi( fl0 ) 5i : A /? : Po P[Po] l-JuDi(d ) rf i : A 

<7o;Si : A)[0i] " /9l-/do;di : A)[/3i] 

Note that rule (1) seems to suggest that S must be guessed. Indeed, this is not 
the case, as it can be proved that the environment generated by a declaration g 
only depends on g and not on the environment used to establish whether g is well 
formed. While the right thing to do is to define two static semantics predicates for 
declarations — one for the generated environments and the other for well-formedness 
[Plo04] — we opted for a more concise presentation. Also notice that the side 
condition in rule (1) explicitly forbids recursive declarations of variables. 4 
Exception declarations. 



hj x : h/ cT : 



hj id : sT : {id sT loc} hj any : 

Catch clauses. 

\~ip:S (3[S\ l-/uDi( P ) s (3 h 7 k (3 hj k x 
13 ~i ( P ) s ' P\~iko-M 



4 Namely, a recursive declaration such as rec gvarid : sT = e is not well-typed. 
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Statements. 



hj e : sT 



if /3(id) = sT loc 



hj nop 



/3 hj id := e 



/3, fps hj cs 



if /3(id ) = sT loc and /5(id) = fps -> sT 



/3 hj id := id(es) 



(3 hj s ;si 
0\~i e: boolean /3 hj s /? hj s x 



/3 hj d : 0o 0[0o\ r-/uDi(d) s 

hj d;s 
/? hj e : boolean /3 hj s 



/? hj if e then s else si 



/3 hj while e do s 

/3 h 7 e : sT 



/3 hj throw x 



/3 hj throw e 



/3I-/S 



/? hj s I - / si 



/3 hj try s catch fc 



hj try so finally si 



A program (/ is said to be waZzd if and only if it does not contain any occurrence 
of a reserved identifier id e rid, \- g : and /3(main) = □ — ► integer. 

5. CONCRETE DYNAMIC SEMANTICS 

For the specification of the concrete dynamic semantics for CPM, we adopt the 
G°°SOS approach of Cousot and Cousot [CC92c]. This generalizes with infinite 
computations the natural semantics approach by Kahn [Kah87], which, in turn, 
is a "big-step" operational semantics defined by structural induction on program 
structures in the style of Plotkin [Plo04] . 

5.1 Absolute Locations and Indirect Locators 

An absolute location (or, simply, location) is a unique identifier for a memory area 
of unspecified size. The (possibly infinite) set of all locations is denoted by Loc, 
while individual locations are denoted by I, lo, h and so forth. We also postulate 

def 

the existence of a set Ind = N of indirect (stack) locators such that Locfllnd = 0. 
Indirect locators are denoted by i, io, i\ and so forth. For notational convenience, 

def 

we define the set of addresses as Addr = Loc HJ Ind. Addresses are denoted by a, 
do, a\ and so forth. 

5.2 Concrete Execution Environments 

The concrete dynamic aspect of declarations is captured by concrete execution 
environments. These map a finite set of identifiers to concrete denotable values. In 
the sequel we will simply write 'environment' to refer to execution environments. 

Definition 5.1. (Abstract, dVal, Envj.) We define 
Abstract =' { A fps . body | fps e formParams, body G Body }. 



14 • R. Bagnara, P.M. Hill, A. Pescetti, and E. Zaffanella 



The set of concrete denotable values is 
dVal = (Addr x sType) W Abstract. 

dcf 

For I C f Id, Env/ = I — ► dVal is the set of concrete environments over /. The 

set of all environments is given Env = l+J/ Cf id Env/. Environments in Env/ are 
denoted by p, pa, pi and so forth. We write p : I as a shorthand for p E Env/. For 
p : I and [3 : I, we write p : [3 to signify that 

Vid G I : (3(a, sT) G Addr x sType . /3(id) = sT loc A p(id) = (a, sT)) 
V (3abs = (Afps . body) G Abstract . /3(id) = fps -> sT A /3 h 7 body : sT 

A p(id) = abs) . 

5.3 Memory Structures, Value States and Exception States 

A memory structure uses a stack and suitable operators to allocate/deallocate, 
organize, read and update the locations of an absolute memory map, which is a 
partial function mapping a location and a storable type to a storable value. Memory 
structures model all the memory areas that are used in the most common imple- 
mentations of imperative programming languages: the data segment (for global 
variables) and the stack segment (for local variables) are of interest for the lan- 
guage fragment we are considering; the text segment (where pointers to function 
point to) and the heap segment (for dynamically allocated memory) are required 
to deal with the extensions of Section 9. As it will be clear from the following 
definition, our notion of memory structure is underspecified: while we define it and 
its operations so that the semantics of programs is the expected one, we allow for 
many possible implementations by leaving out many details that are inessential to 
the achievement of that objective. It is for this same reason that we treat locations 
as unique identifiers neglecting the mathematical structure they may or may not 
have. More generally, what we call "concrete semantics" is indeed an abstraction of 
an infinite number of machines and compilation schemes that could be used to ex- 
ecute our programs. Furthermore, since the considered fragment of CPM does not 
support pointers, arrays, type casts and unions, we can here make the simplifying 
assumption that there is no overlap between the storage cells associated to different 
locations. In Section 9 we will hint at how these assumptions must be modified in 
order to accommodate the full language. 

Memory structures will be used to describe the outcome of computations whose 
only observable behavior is given by their side effects. Computations yielding a 
proper value will be described by a value state, which pairs the value computed 
with a memory structure recording the side effects of the execution. Exceptional 
behavior must, of course, be taken into proper account: thus, the result of an 
exceptional computation path will be described by pairing the memory structure 
with an exception, yielding what we call an exception state. 

Definition 5.2. (Map, Stack, Mem, ValState, ExceptState.) The set of all 
absolute maps is the set of partial functions 



Map = (Loc x sType) >— > sVal. 
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Absolute maps are denoted by fj,, /j,o, \i\ and so forth. The absolute map update 
partial function 

•[• := •] : (Map x (Loc x sType) x sVal) >-> Map 

is defined, for each \i G Map ; (I, sT) £ Loc x sType such that (I, sT) £ dom(/i) and 
sval £ sVal such that sT = type(sval), by 

n[(l,sT) := sval] d = //, 

where // £ Map is any absolute map satisfying the following conditions: 

(i) dom(^i') = dom(^); 
(ii) sT) = sval; 

(Hi) /i'(Z',sT') = m(^sT'), for each (l',sT') £ dom(^) such that I' ^ I. 

LetW = (LocU{f,t})*. An element w £ W is a stack if and only if no location 
occurs more than once in it. The set of all stacks is denoted by Stack, 'f ' is called 
stack marker and is called frame marker. The top-most frame of w £ Stack, 
denoted by ti(w), is the longest suffix of w containing no frame marker; formally, 
tf(w) G (Loc U {!}) satisfies either w — tf(w) or w = w'\ti(w). The partial infix 
operator @: Stack x Ind >— > Loc maps, when defined, a stack w and an indirect 
locator i into an absolute location to be found in the top-most frame; formally, if 

i < n — tf(u>)|, tf(w) = Zq ■ ■ ■ z n -i and Zi = I, then w @ i = I 

A memory structure is an element 
are denoted by a, an, <J\ and so forth. 

A value state is an elei 
by v, vo, v\ and so forth. 

An exception state is an element of ExceptState *== Mem x Except. Exception 
states are denoted by e, Eq, £i and so forth. 

The overloading @ : Mem x Addr >— > Loc of the partial infix operator @ is defined, 
for each a = (/i, w) and a G Addr, as follows and under the following conditions: 

„ def J a, if a £ Loc; 
a @ a = < 

I I, if a £ Ind and I — w @ a is defined. 

The memory structure read and update operators 

■[;■]: (Mem x Addr x sType) -> (ValState W ExceptState), 

•[• := •] : (Mem x (Addr x sType) x sVal) (Mem W ExceptState) 

are respectively defined, for each a = (/i, w) £ Mem, a G Addr, sT G sType and 
sval G sVal, as follows: let d — (a @ a, sT); then 



dcf 

A memory structure is an element of Mem = Map x Stack. Memory structures 

def 

A value state is an element of ValState = sVal x Mem. Value states are denoted 



a[a, sT] = 



(fj,(d),a), ifdedom(fi); 
(a, memerror), otherwise; 



r ^ ^ dcf [ (fJ-[d := sval], w), if d £ dom(fi) and sT = type(sval); 
1 (a, memerror), otherwise. 
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The data and stack memory allocation functions 

newd: ValStatc — > ((Mem x Loc) l+l ExccptState) , 
ncw s : ValState -> ((Mem x Ind) l+l ExceptState) 

are defined, for each v — (sval, a) G ValState, where a — {p, w), by 

dcf J ({fjf , w) , 1) , if the data segment of a can be extended; 
(<7, datovf lw), otherwise; 



newd(w) 



new s (v) = 



((// ,w'),i), if the stack segment of a can be extended; 
(a, stkovf lw), otherwise; 



where, in the case o/new s , w' G Stack and i G Ind are such that: 

(i) w' = w ■ I; 
(ii) i=|tf(«;)|; 

and, for both newj and new s , p! G Map and I G Loc are such that: 

(Hi) for each sT G sType, (Z,sT) ^ dom(^); 
(iv) for each (Z',sT") G dom(/z), //(/', si") = p{l',sT'); 
(v) ju'(Z,type(sval)) = sval. 

The memory structure data cleanup function cleanup d : ExceptState — > ExceptState 
is given, for each e — {a, £) G ExceptState, by 

cleanup d (e) d = ((0,e),£). 
77ie siacfc marfc function mark s : Mem — ► Mem is given, for each a G Mem, by 

mark s (f) =' (yU, wf), w/iere er = (/x, w). 

T7ie s£acfc unmark partial function unmark s : Mem >— > Mem is given, for each a G 
Mem such that a = (p, w']w") and w" G Loc*, by 

unmark s (^, w'\w") = f (p',w'), 
where the absolute map p! G Map satisfies: 
(i) dom(p') = { (Z, sT) G dom(/x) | I does not occur in w" }; 

(ii) p! =Mldom( A( ')- 

TTie frame link partial function link s : Mem >— > Mem is given, for each a G Mem 
such that a = (p,w'\w") and w" G Loc*, by 

link s (/i, w'\w") =' (p,w'\w"). 

The frame unlink partial function unlink s : Mem >— > Mem is given, for each a G 
Mem suc/i that a — (p, w'\w") and w" G Loc*, by 

unlink s (p, w'\w") = f {p,w'\w"). 
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For ease of notation, the stack unmark and the frame unlink partial functions are 
lifted to also work on exception states. Namely, for each e = (a, £) G ExceptState, 

unmark s (er, £) = f (unmark, (er), £) ; 
unlink s ((7, £) = f (unlmk s (er), £) . 

Intuitively, global variables are allocated in the data segment using newj and are 
accessed through absolute locations; function cleanup d models their deallocation 
due to an RTS exception thrown during the program start-up phase. The functions 
mark s and unmark s use the stack marker 'f to implement the automatic allocation 
(through new s ) and deallocation of stack slots for storing local variables, return 
values and actual arguments of function calls. The functions link s and unlink s use 
the frame marker 'J' to partition the stack into activation frames, each frame cor- 
responding to a function call. All accesses to the top-most frame can be expressed 
in terms of indirect locators (i.e., offsets from the top-most frame marker), because 
at each program point the layout of the current top-most frame is statically known. 
As it will be clearer when considering the concrete rules for function calls, the frame 
marker is used to move the return value and the actual arguments, which are allo- 
cated by the caller, from the activation frame of the caller to the activation frame 
of the callee, and vice versa. 

The memory structures and operations satisfy the following property: for each 
pair of memory structures <7n and o~\ such that o\ has been obtained from <tq by any 
sequence of operations where each link s is matched by a corresponding unlink s , for 
each indirect locator i G Ind, if <7n@i and a±@i are both defined, then ao@i = ai@i. 

As anticipated, we profit from the lack of aliasing in the fragment of CPM consid- 
ered here, i.e., we assume there is no overlap between the storage cells associated 
to (?o,sTo) and the ones associated to (Zi,sTi), unless Zn — Moreover, we 
need not specify the relationship between p{l, sTn) and /z(Z, sTi) for the case where 
sTn 7^ sTi. This also implies that the absolute map update operator is underspec- 
ified, resulting in a nondeterministic operator. Of course, any real implementation 
will be characterized by a complete specification: for instance, a precise definition 
of the memory overflow conditions will take the place of the informal conditions "if 
the data (resp., stack) segment of a can be extended" in the definitions of newd and 
new s . As is clear from the definition above, where memory is writable if and only 
if it is readable, we do not attempt to model read-only memory. It is also worth 
observing that, in the sequel, the "meaning" of variable identifiers will depend on 
unrestricted elements of Env x Mem. As a consequence we can have dangling ref- 
erences, that is, a pair (p, a) G Env x Mem with p : I can be such that there exists 
an identifier id G / for which p(id) = (a, sT) and er[a, sT] = memerror. 

5.4 Configurations 

The dynamic semantics of CPM is expressed by means of an evaluation (or re- 
duction) relation, which specifies how a non-terminal configuration is reduced to 
a terminal configuration. The sets of non-terminal configurations are parametric 
with respect to a type environment associating every identifier to its type. 
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Definition 5.3. (Non-terminal configurations.) The sets of non-terminal 
configurations for expressions, local and global declarations, statements, function 
bodies and catch clauses are given, respectively and for each (3 G TEnv/, by 

it = f { (e, cr) G Exp x Mem | 3sT G sType . h 7 e : sT }, 

Td = { (d, <t) G Dccl x Mem | 36 G TEnv . (3 h 7 d : 5 }, 

if d = { ( 5 , cr) G Glob x Mem | 35 G TEnv .f3h ig :6}, 

If d = { (s, cr) G Stmt x Mem | j3 hj s }, 

if d = { (body, ct) G Body x Mem | 3sT G sType . /3 hj body : sT }, 

r k =* { ( fc > £ ) e Catch x ExceptState | hj }. 

Each kind of terminal configuration has to allow for the possibility of both a 
non-exceptional and an exceptional computation path. 

Definition 5.4. (Terminal configurations.) The sets of terminal configura- 
tions for expressions, local and global declarations, statements, function bodies and 
catch clauses are given, respectively, by 

T c = ValState l±l ExceptState, 

T d d = T g d = (Env x Mem) W ExceptState, 

T s =' T b =* Mem W ExceptState, 

T k =' ({caught} x T s ) t±J ({uncaught} x ExceptState). 

Note that T e is defined as ValState W ExceptState; as it will be apparent from the 
concrete semantics, expressions never modify the memory structure, so T e could 
have been defined as sVal l±) Except; but defining it as ValState W ExceptState sim- 
plifies the approximation relations in Section 6. 

In the following, we write N and n to denote a non-terminal and a terminal con- 
crete configuration, respectively. For clarity of notation, we often use angle brackets 
to highlight that a tuple is indeed representing a configuration. Angle brackets are 
not normally used for configurations made of a single element. Therefore, when 
e = (cr, £) G ExceptState, we indifferently write e E T s or (cr, £) e T s , as well as 
(caught, e) G T k or (caught, (er,£)) £ T k . 

A few explanatory words are needed for T k . When the evaluation of a non- 
terminal configuration for catch clauses (fc, e) G if yields the terminal configuration 
(caught, 77) G T k , then the exception £ in e = (cr, £) was caught inside fc and n G T s 
is the result of evaluating the corresponding exception handler statement; note that 
77 G T s may itself be another exception state, meaning that another exception was 
thrown during the evaluation of the exception handler statement. In contrast, when 
the resulting terminal configuration is (uncaught, e) e T k , then the exception in e 
was not caught inside k and will be propagated to the outer context. 5 



5 Note that the names of the labels caught and uncaught have been chosen as such for clarity, but 
provide no special meaning: they are only needed for a correct application of the disjoint union 
construction, since we have T B fl ExceptState 7^ 0. 
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5.5 Concrete Evaluation Relations 

For convenience, in order to represent function closures, we extend the syntac- 
tic category of local declarations with (recursive) execution environments. These 
syntactic constructs are meant to be only available in the dynamic semantics (in 
non-terminal configurations): they cannot occur in the program text. Thus we have 

Decl 3 d ::= . . . | p | rec p 

Consequently, if p : I we define DI(p) = f Dl(recp) = f /, FI(p) = Uide/ Fl(p(id)) 

and FI(rec p) = FI(p) \ I, where the function FI is defined on elements of dVal by 

FI(Z,sT) d = FI(i,sT) d = and FI(Afps . body) d = Fl(body) \ Dl(fps). The static 
semantics is extended by adding the rules 

p:8 j3[5 | j] h/uj p : 6 



(3 hj p : S (3 I- j recp : S 



if J = FI(p) n Bl(p) and Vid : (id h-> sT loc) £ S. 



The concrete evaluation relations that complete the definition of the concrete 
semantics for CPM are defined, as usual, by structural induction from a set of rule 
schemata. The evaluation relations are of the form p \-p N — > n, where (3 £ TEnv/, 
p £ Env,/, p : f3\j and, for some q £ {e, d, g, s, b, k}, N £Y@ and n £ T q . 

5.5.1 Expressions 

Constant. 

(2) 
(3) 

(4) 
(5) 



p \-p (con, cr) — > (con, a) 
Identifier. 

p \-p (id, cr) -> cr[p(id)] 
Unary minus, 
p \-p (e,a) -> e 
p hp (-e,cr) -> e 

p \-p (e,cr) -> (to, cr ) 
p \-p (-e,cr) -> (-m,a ) 



Binary arithmetic operations. Letting @ denote any abstract syntax operator in 
{+,—,*,/,%} and o £ {+,—,-, mod} the corresponding arithmetic operation. 
Then the rules for addition, subtraction, multiplication, division and remainder are 
given by the following schemata: 

(e ,cr) -^e 

(6) 



p r-/3 (en @ ei,cr) — > £ 
P r-/3 (e , cr) -> (m , cr ) P (ei, CT ) 
p 1-/3 (eo ei,cr) -> e 



(7) 
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p\-p (e ,<r) (m ,(To) P \~p (ei, a ) -» (mi, 01) 

if £ {/, %} or mi ^ 

p 1-/3 (e ei, ct) -> (m o mi, <ri) 

(8) 

P 1-/3 (e ,cr) -> (m ,cr ) P ^ (ei, cr ) -» (0, <Ti) 

if 13 € {/, %} (9) 

p (eo @ ei, cr) — » (ai , divbyzero) 

Arithmetic tests. Let EH € {=, 7^, <, <, >, >} be an abstract syntax operator and 
denote with 'jg' the corresponding test operation in Z x Z — > Bool. The rules for 
the arithmetic tests are then given by the following schemata: 



p \-p (e ,a) -> e 
p 1-/3 (eo ffl ei,cr) -> e 

P 1-/3 (eo,a) -> {mo,<j ) p (ei, <r ) -> £ 

P 1-/3 (eo ffl ei,o-> -> e 
P 1-/3 (eo, cr) -> (m , cr ) P r-/3 (ei, cr ) -> (mi, (Ji) 
P r-/3 (e ffl ei, cr) — > (m «5 mi,<Ti) 

Negation. 

P r-/3 (6,CJ) -> £ 

p h^j (not 6, cr) — > e 

P r-0 (fe,cr) -> (t,cr ) 



p h/3 (not b,cr) -> (-. t, CT- ) 
Conjunction. 

P I - /? (60, cr) -> £ 
p h/3 (60 and 61, cr) — > e 

P 1-/3 (&0,cr) -> (ff,cr ) 

p (6 and 61, cr) -> (ff, cr ) 
p h/3 (6 , cr) -> (tt, cr ) p hp (61, <T ) -» 77 
p r-^ (6 and 61, cr) — > 77 
Disjunction. 

P r-/3 (^0,cr) -> £ 
p h/3 (6 Or 61, cr) -> £ 

P r-/3 (60, cr) — > (tt, (To) 
p (b or 61, cr) -> (tt,<T ) 
P r"/3 (^0, cr) — ► (ff, cr ) p \-p (61, (To) - * ^ 

p h (b or 61, <t) -> ry 



(10) 

(11) 

(12) 

(13) 
(14) 

(15) 
(16) 
(17) 

(18) 
(19) 
(20) 
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5.5.2 Declarations 
Nil. 

p 1-/3 (nil, a) -> (0,cr) 
Environment. 

P r-/3 (P0,O") -> (PO,^) 

Recursive environment, 
p \-p (rec p ,cr) -> (pi,ct) 

if pi = { id po(id) | Po(id) = Afps . extern : sT } 

Vi G {0, 1} : absi = Afps . let di ins result e 

Po(id) = abs , d\ = rec(p \ Dl(fps)); d 

Global variable declaration. 

P 1-/3 (e,cr) -> e 



U s id i ► absi 



(21) 



(22) 



(23) 



(24) 

if newd(t)) = e (25) 

(26) 



p \-p (gvar id : sT = e, cr) — > cleanup d (e) 

p h/3 (e,cr) — > w 
P r~/3 (gvar id : sT = e, cr) — > cleanup d (£) 

P 1-/3 (e, cr) -> w 
p (gvar id : sT = e,cr) -> (pi,cxi) 

if newd(u) = (oi,Z) and pi = {id i— > (Z,sT)}. 
Loca/ variable declaration. 

P r-/3 (e,cr) -> e 
p h^j (lvar id : sT = e, cr) — > unmark s (e) 

p h/3 (e, cr) — > u 
p \-p (lvar id : sT = e, cr) — > unmark s (e) 

p \-p (e,a) -> v 
p h/3 (lvar id : sT = e,a) -> (pi,ai) 

if new s (w) = (oi, i) and pi = {id (i, sT)}. 
Function declaration. 

p h/3 (function id (fps) = body , cr^ — ► (po,<r) 

if p = {id i — > A fps . bodyj} and either body = bodyj = extern : sT or, for each 
i G {0, 1}, body^ = let di in s result e, I = FI(body ) \ Dl(fps) and d\ — p | 7 ; do. 



(27) 

if ncw s (w) = e (28) 

(29) 
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Recursive declaration. 
(P \ J) l-/3[/3i] (9, o) -» (po, oft p ft (rec po, o ) -> f] 
P ft (rec.g,cr) -> 77 

if J = Fl(g) n DI(ft, h F%) 3 : ft and ft = ft | j. 
Global sequential composition. 
P ft (ffo,o) -> e 

p 1-/3 (50:51,0) -> e 



(31) 



(32) 



P h /5 (.90, 0-) ^ (PO, O ) p[po] ft[/3 ] (#1, oft -» £ 

■ if /? h FI(go) 5o : ft (33) 

P ft (ffo;3i, cr ) £ 

P ft (.9o, a) -» (po, o-o) p[Po] ft[/3 ] (51, o-o) -» (pi, 01) 

— " —— r if P l-FiCflo) 5o : ft 

p h /3 (50551,0) -> (Po[PlJ,0"i) 

(34) 

ioca/ sequential composition. 
P ft (ft,o) -» e 



(35) 

if /3 l-Fi(d„) rfo : ft (36) 



P ft (ft;ft,o) -> e 

p ft (ft, ft (po, (Jo) p[po] ft[/3 ] (ft, o-o) -> £ 
p h,3 (ft; ft, ft -> £ 

P ft (ft, O) -» (p , O ) p[p ] ft[/3 ] (ft, o ) -> (pi, 01) 

— — — —— r if P \-Fi(do) do : ft 

p ft (ft; ft, ft -> (p [pij,oi) 

(37) 

5.5.3 Statements 
Nop. 

(38) 



p (nop, ft -> o- 

p ft (e,ft -> £ 
p (id := e,ft -> e 

P ft (e,o) -> (sval, cr ) 
p h/3 (id := e, ft — > (T [p(id) := sval] 
Statement sequence. 
P ft (so, o) — > £ 
pft (s ;si,o) — >e 

P ft (so, o) — > (T P I — (si) 00) — > ?7 
p (s ;si,ft -> 77 



(39) 
(40) 

(41) 
(42) 
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Block. 

p \-p (d, mark s (cr)) — » e 
p \-/3 (d; s, cr) — » e 

p h/3 (d, mark s (cr)) -> (pp, cr ) p[po] h^] (s, cr ) -> 77 ^ ^ ^ 
ph(3 (d;s,a) — > unmark s (?7) 

Condiiiona/. 

p 1-/3 (e,cr) -> £ 



p h/3 (if e then so else si, cr) — > £ 

p h/3 (e, cr) -> (tt, cr ) p 1-/3 (s ,cr ) -» V 

p hp (if e then s else si, cr) — > 77 
p h/3 (e, cr) -» (ff , cr ) p h/3 (s x , cr ) —> 77 

php (if e then so else si , a) — > 77 

p h/3 (e,cr) -> £ 



p h/3 (while e do s, cr) — > £ 

P h/3 (e,CT) -» (ff, CT ) 

p h/3 (while e do s, cr) — > cr 

p h/3 (e, cr) -> (tt, cr ) p h/3 (s, cr ) -> £ 
p h/3 (while e do s, cr) — > £ 

p h/3 (e, cr) — > (tt, cro) p h^ (s, cr ) — » (Ti p h^ (while e do s, <7i) — > 77 
p h/3 (while e do s, cr) -^77 

Throw. 



p h^ (throw x, cr) -> (a, x) 

p h/3 (e, cr) ^ £ 
p h/3 (throw e, cr) — > e 

p h/3 (e, cr) -> (sval, cr ) 
p h/3 (throw e, cr) — > (cr , sval) 
Th/ blocks. 

p h/3 (s, cr) -> cr 
p h/3 (try s catch fc, cr) — > cro 

P h/3 (s, cr) -> £ p h/3 (k, £ ) -» («, J?) 

p h/3 (try s catch k,a) — > 77 



(43) 

(44) 

(45) 
(46) 
(47) 

(48) 
(49) 
(50) 
(51) 

(52) 
(53) 
(54) 

(55) 

if u e {caught, un caught} (56) 
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p \-p (s , a) — ► cto p \-p (si,a ) —> r\ 
p \-p (try s finally s 1 ,a) -► 77 

P 1-/3 (S , a) -> (<7 ,£o) P I - /? (S1,CT ) -> CTi 

phg (try s finally Si, a) -> (cti,£ ) 

P I"/? (SO, cr) — > (<7 ,£o) P ha (Si, cr ) -» £ 



(57) 
(58) 
(59) 



p \-p (try s finally si , a) -> e 
Function call. Consider the following conditions: 
/?(id) - (f P s -» sT ) 

p(id) = Aidi :sTi,...,id n :sT„ .body } ((iO) 

d = (lvar x : sT = id ; lvar : sTi = e\\ . . . ; lvar x n : sT„ = e„) ^ 
Pi = {x ^(0,sT )}u{id j ^(i,sT J ) | j = l,...,n}, po:A), Pi : Pi- (61) 
Then the rule schemata for function calls are the following: 

p kg (d, mark s (a)) — > e 

^— — if (60) holds (62) 

p \- p (id := id(ei, . . . , e n ), ct) -> e 

p I-/? (d, mark s (a)) ->■ (po,cro) 

P[pi] kg^x] (body,link s (o- )) -> g 

p h/3 (id := id(ei, . . . ,e n ),a) — > unmark s (unlink s (£)) 

(63) 



if (60) and (61) hold 



phg (d,mark s (cr)) -> (p ,o-o) 
p[pi] k^] (body,link s (o-o)) cti 

^ < id ° : - ^o, unlinks^)) . f (6Q) md (gi) 

p hg (id := id(ei, . . . , e n ),a) -> unmark s (r? 2 ) 

Note that parameter passing is implemented by using reserved identifiers that refer- 
ence the return value (x ) and the actual arguments (x 1; . . . , x n ). When evaluating 
the function body (i.e., after linking a new activation frame), the callee can get ac- 
cess to the return value and the arguments' values by using the indirect locators 
and 1, . . . , n, respectively; to this end, the callee uses the environment pi, where 
the reserved identifier x is still mapped to the return value, whereas the arguments 
are accessible using the formal parameters' names idi, . . . , id„. 



5.5.4 Function Bodies. 
p\-p (d, mark s (a)} 
p kg (let d in s result e, a) — > e 

p hg (d, mark s (er)) (p , cr ) p[po] kg[g ] ( s ' 
p (letd in s result e,cr) — > unmark s (e) 



(65) 

if/3h FI(d) rf:/3 (66) 
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p\-p (d,mark s (cr)) -> (po,cr ) 

PiPo} |-/3[/3 ] ( s > °o) O"! 

P[Po] h /3[/3o] (x := e, cri) -> T) 
p \-p (let d in s result e, cr) — > unmark s (?7o) 



if /? l-Fi(d) : /3 



p (extern : sT, (/x, u;)) — > ry 
if 3cr = (/Uo, io) G Mem, £ G Except . 77 = cr V r/ = (cro, £). 
5.5.5 Cate/i Clauses 
Catch. 

p \- fj (s,cr) -> 770 
P h /3 ((p) s > (ct, 0) -> (caught, jjo) 

if p = £ G RTSExcept, or p = type(£), or p = any. 

p hp ((id : sT) s, (cr, sval)) — > (caught, unmark s (£ )) 
if sT = type(sval) and £n = new s (sval, mark s (cr)). 

p[{idn^ (i,sT)}] r-^id^gT loc}] (s,<7 ) -> T] 

p hp ((id : sT) s, (cr, sval)) — > (caught, unmark s (r? )) 
if sT = type(sval) and (cr ,i) = new s (sval, mark s (cr)). 

P h /3 ((p) s , (ct, 0) (uncaught, (cr, £)) 

if, letting cT = type(£), we have p ^ {£, cT, any} and Vid 6 Id : p ^ id : cT. 
Catch sequence. 

P r-/3 (fcn,£) -> (caught, 

phg (fc ;fci,e) -> (caught, p ) 

pr-/3 (fe ,e) -> (uncaught, e ) p (fci,£ ) -> ?7 



(67) 



(68) 



(69) 



(70) 



(71) 



(72) 



(73) 
(74) 



p h/j (k ;h,s) -> 77 
5.6 Concrete Divergence Relation 

In order to capture divergent computations, we follow the approach of Cousot and 
Cousot [CC92c] , also advocated by Schmidt [Sch98] and Leroy [Ler06] . This consists 
in introducing a divergence relation by means of sequents of the form p hp N 
where N £ T q and q G {s,b,k}. Intuitively, a divergence sequent of the form, 
say, p \-p (s, cr) means that, in the context given by p and cr, the execution of 
statement s diverges. We now give a set of rules that (interpreted coinductively, 
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as we will see later) allow to characterize the behavior of divergent computations. 
For instance, the following rule schemata characterize the divergence behavior of 
statement sequences: 

p hg (s ,a) p hp (s ,a) -> (T P r-/9 (si,cr ) 



P 1-/9 (so; si, a) — > p h/9 (s ;si,a) — > 

Notice that, once the set of concrete rules characterizing finite computations is 
known, the concrete rules modeling divergences can be specified systematically 
(and thus implicitly). Namely, for each concrete rule 

Pa ■■■ Pi-i Pi^-piNi^rn P t+ i ■■■ P h -! 

(side condition) (75) 

p h/9 N -> n 

such that < i < h and, for q G {s,b,k}, TV; G |+JT^ and N G there is the 

corresponding divergence rule where the z-th premise is diverging, i.e., 

Po ••• Pi-l p t ^(3 l N l ^ 

x = (side condition) 

p h/9 N > 

Therefore, there are two rules above modeling the divergence of statement se- 
quences, which can be obtained from rule (42). It is worth noting that a single 
divergence rule schema can be obtained from more than one of the concrete rules 
in Section 5.5. 

We will use the terms negative and positive to distinguish the different kinds of 
rules constructed in this and the previous section, respectively. 

Definition 5.5. (Concrete semantics rules.) The set 1Z+ (resp., 1Z-) of 
positive (resp., negative,) concrete semantics rules is the infinite set obtained by 
instantiating the rule schemata of Section 5.5 (resp., Section 5.6 ^ in all possible 

ways (respecting, of course, the side conditions). Moreover, 1Z = f 1Z + W 1Z-. 
5.7 Concrete Semantics Trees 

The concrete semantics of a program is a (possibly infinite) set of finite or infinite 
trees. Such trees are defined in terms of the (infinite) set of instances of the rules 
defined in the previous two sections. 

Let S be the (infinite) set of sequents occurring in the premises and conclusions 
of the rules in 1Z. The concrete semantics universe, denoted by U, is the set of 
finitely branching trees of at most w-depth with labels in S. 

Definition 5.6. (Concrete semantics universe.) A set P C N* is prefix- 
closed if, for each z E N* and each neN, zn G P implies z G P. A set P C N* is 
canonical if, for each z G N* there exists h G N such that 

{neN zn G P} = {Q,...,h- 1}. 

An 5-tree is a partial function 9: N* >—> S such that dom(9) is prefix-closed and 
canonical. The concrete semantics universe U is the set of all S-trees. 

For each p G dom(0), the tree 9^ defined, for each z G N*, by 9^(z) = f 9(pz), 
is called a subtree of 9; it is called a proper subtree if p ^ e. If dom(0) = 0, 
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then is the empty tree. If 9 is not empty, then 9(e) is the root of 9 and, if 
{0, . . . , h — 1} C dom(0) and h dom(#), then 0[ O ], . . . , are its immediate 

subtrees (note that ft G N may be zero); in this case 9 can be denoted by 



iI2L 



9(e) 



Definition 5.7. (Concrete semantics trees.) Let T + : p(U) — > p(W) 6e f/ie 
continuous function over the complete lattice (p(£Y),c) given, for all U G p(W), &y 



^•+(C/) 



def 



fo, • • • 

00 (e) 



• fc _i(e) 



g ft 4 



def 



T/ie set of positive concrete semantics trees is 9+ = \fp c (J 7 + ). Consider now the 
co-continuous function : p{U) — * p(U) given, for each U G piU), by 



def 



9 ■■■ 9h-i 



Oo, ■ ■ ■ , Qh-2 G 9_|- , 

9 (e) ■■■ e h -!(e) 



G K- 



> . 



The set of negative concrete semantics trees is 9_ = gfp c (.F_). TTie set of all 



def 



concrete semantics trees is <d — 9+ tt) 9 



We now show that, for every concrete non-terminal configuration, there exists a 
concrete semantics tree with that in the root. 

Proposition 5.8. For each f3 G TEnv ; p G Env such that p : (3 and N G 
where q G {e, d, g, s, b, k}, there exists 9 G 9 such that 

9(e) e{(ph f ,N^r 1 )\r 1 eT q }is{(ph f ,N^)}. 

PROOF. If q = e and n G T , we say that the sequent (p\-p N — > n) is well-typed 
if iV = (e, do) and 77 = (sval, a\) imply (3 h e : type(sval). For the proof, let 



S+(p, /3, N) = { 3 I 3 = (p h/3 TV - 77), 77 G T„ (q = c s is well-typed) }. 

We now assume that TV G is a fixed but arbitrary non-terminal configuration. 
It suffices to show there exists 9 G 9 such that 9(e) G S+(p, {3, A r )tbl{ (p hp TV -^) }. 
Let i?o be the set of all rules in TZ+ whose conclusions are in S+(p, f3, N). By 
inspecting the concrete evaluation rule schemata in Section 5.5, Rq 7^ 0. Let j > 
be the maximal value for which there exist finite trees 60, ■ ■ ■ G 9+ where 

Pq = #o(e)) • • • j Pj-i = 6j-i( e ) are the first j premises of a rule in Rq. Let Rj C R 
be the set of all rules in i? with P , • • • , -Pj-i as their first j premises; then Rj ^ 0. 
By inspecting the rule schemata in Section 5.5, it can be seen that, if there exists 
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P ° - ^1 p i - e R 3 for some Pj G S+(pj, /3j,Nj) and s' <= S+(p,/3,N), then 6 

VP, G ( Pj , & , iVj) : 3,s G S+ (p, (3,N). P ° '■ - Pj ~ l ^ - ' ' G Rj. (76) 

Suppose that q G {e, d,g} so that we can also assume N — (u,cr). We show by 
structural induction on u that there exists 9 G 0+ such that 9(e) G S+(p, (3, N). 
By inspecting the rule schemata in Section 5.5, it can be seen that, if u is atomic, 
the rules in R have no premises (so that j = 0) and hence, letting 9 G 6+ 
be the singleton tree consisting of the conclusion of a rule in R , we obtain that 
0(e) G S + (p, (3, N). Otherwise, u is not atomic, we show that each of the rules in 
Rj has exactly j premises; to do this, we assume there exists a rule in Rj with a 
(j + l)-th premise Pj and derive a contradiction. Let Nj G Tq 3 - be the non-terminal 
configuration in Pj. By inspecting the rule schemata in Section 5.5 in the case that 
q G {c, d, g}, it can be seen that: 

(i) Qj G {e,d, g} so that Nj has the form (uj,aj); 

(ii) Uj is a substructure of u unless Rj consists of instances of the schematic 
rule (31) and j = 1. 

If Uj is a substructure of u, by property (i), we can apply structural induction to 
obtain that there exists a finite tree Oj G 6+ such that Pj = Oj(e) G S + (pj, (3j,Nj); 
hence, by property (76), there exists a rule in Rj having Pj as its (j + l)-th premise; 
contradicting the assumption that j was maximal. Otherwise, by property (ii), if 
Uj is not a substructure of u, the rules in P must be instances of rule schema (31) 
and j = 1; in this case, rule schema (23), which has no premises, can be instanti- 
ated with the second premise of a rule in Rj as its conclusion; and again we have 
a contradiction. Thus, for any Uj, all rules in Rj have exactly j premises. By Defi- 
nition 5.7, 9 = 00 '"Z 3 - 1 G 6+ for some s G S+(p,0,N). Therefore, since 9+ C 6, 
the thesis holds when q G {e, d, g}. 

Suppose now that q G {s,b, k}. We prove that, if there does not exist a tree 
6 G 0+ such that 9(e) G S+(p, {3, N), then, for all n > 0, there exists a tree 9 

such that 9(e) = Soo = f (p TV -^+) and 9 G T 7 1(U). To this end, we reason by 
induction on n > 0. By our assumption that there is no tree 9 G 6+ such that 
9(e) G S + (p, (3, N), there must exist a rule 

Pp ••• Pj^Pj ■■■ 



u 3 

for some Pj G S + (pj , (3j ,Nj); let qj be such that Nj G Tq. . By the maximality 
of j, there is no tree in 6+ whose root is Pj. We have already shown that, if 



6 To help understand this property, we illustrate it in the case that q = e and the non-terminal 
configuration is = (&o and &i, cr)\ hence the concrete rule schemata (15)-(17) will apply. In all 
the rule instances, the first premise is of the form Pq = (p hp No — > rjo), where No = {bo,cr); as 

a consequence, we have <S+(p, 0, No) = { (p hp No — ► 770) I Vo € B }, where B d = ExceptState tt) 
{{t, ao) £ T e | ( 6 Bool, (jo 6 Mem}. Thus, for each terminal configuration r)o S B, there is 
a rule instance having rjo in its first premise — that is we instantiate rule (15) when rjo = e, 
rule (16) when rjo = (ff , o"o) and rule (17) when rjo = (tt, ao). Thus property (76) holds for j = 0. 
Moreover, although only rule (17) applies when j = 1, the terminal configuration for the second 
premise (Pi) is just any terminal configuration in T c . Thus property (76) also holds for j = 1. 
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<7j G {e, d,g}, then there exists a tree 6j G 6+ such that 9j(e) G S + (pj , (3j, Nj); 
thus, by property (76), there must be a rule in Rj whose (j + l)-th premise is 0j(e); 
contradicting the assumption that j > is maximal. Hence qj G {s,b, k}. By 
the definition of the negative concrete semantics rules in Section 5.6, there exists a 
corresponding negative rule 



such that Pqo = (pj \-fj j Nj ■—>■). Hence, by Definition 5.6, there exists a tree in 
U = J^LilJ) with root Soo, so that the inductive hypothesis holds for n = 0. Suppose 
now that n > 0. By the inductive hypothesis, there exists a tree 0oo G JF" _1 (W) 
such that floofe) = Poo. Hence, by Definition 5.7, 6 ° - 8 " G TlilA). Thus, for 
all n > 0, there exists a tree in T™\U) with root Soo and hence, by Definition 5.7, 
there exists a tree in 9_ with root Soo- Since 9 = 6+ W 9_, the thesis holds when 
q G {s,b,k}. □ 

The concrete semantics of a valid program g with respect to the initial memory 

def 

structure <7; = (0, e) G Mem is a set of concrete semantics trees. This set will 
always include a tree 9q G 9 (which, by Proposition 5.8, must exist) such that 



If 770 = £(b i-e., an RTS exception is thrown during the evaluation of g, then the 
concrete semantics is {0q}- If, instead, 770 = {po,&o)> then the concrete semantics 



The concrete semantics for CPM we have just presented, extended as indicated 
in Section 9, allows us to reason on a number of interesting program safety proper- 
ties (such as the absence of division-by-zero and other run-time errors) as well as 
termination and computational complexity. In the next section, we will see how the 
usually non-computable concrete semantics can be given an abstract counterpart 
that is amenable to effective computation. 

6. ABSTRACT DYNAMIC SEMANTICS 

For the specification of the abstract semantics, we mainly follow the approach 
outlined in the works by Schmidt [Sch95; Sch97; Sch98]. The specification of the 
abstract semantics requires that appropriate abstract domains are chosen to provide 
correct approximations for the values that are involved in the concrete computation 
[CC77a; CC79; CC92a; CC92c]. For the sake of generality and extensibility, we will 
not target any specific abstraction, but rather consider arbitrary abstract domains 
that satisfy a limited set of properties that are sufficient to provide the correctness 
of the overall analysis without compromising its potential precision. 

6.1 Abstract Semantic Domains 

We adopt the framework proposed in [CC92a, Section 7], where the correspon- 
dence between the concrete and the abstract domains is induced from a concrete 



Pq • • • Pj-l Pc 



G 11- 



60(e) = 0l- ((g; gvarx : integer = 0), a--) -> % 



is 
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approximation relation and a concretization function. For the sole purpose of sim- 
plifying the presentation, we will consider a particular instance of the framework 
by assuming a few additional but non-essential domain properties. The resulting 
construction is adequate for our purposes and still allows for algebraically weak 
abstract domains, such as the domain of convex polyhedra [CH78]. 

A concrete domain is modeled as a complete lattice (C, C, _L, T, n, U) of seman- 
tic properties; as usual, the concrete approximation relation c\ C c 2 holds if c\ 
is a stronger property than c 2 (i.e., c 2 approximates c\). An abstract domain is 
modeled as a bounded join-semilattice (£)", c", jJ, u"), so that it has a bottom 
element jj and the least upper bound d\ \J 4 exists for all d\,d\ G DK When 
the abstract domain is also provided with a top clement T" G we will write 
(D", E 8 , -L s , T", u"). The abstract domain D* is related to C by a monotonic con- 
cretization function 7: D* — > C: in words, C is approximated by D* through 7; 
this approximation is said to be strict if 7 is a strict function. 7 

In order to compute approximations for specific concrete objects, we assume the 
existence of a partial abstraction function a: C >— » D* such that, for each c G C, 
if a(c) is defined then c C 7(a(c)). In particular, we assume that a(_L) = JJ is 
always defined; if an abstract top element exists, then a(T) = T" is also defined. 
When needed or useful, we will require a few additional properties. 

Most of the concrete domains used in the concrete semantics construction are 
obtained as the powerset lattice (p(D), C, 0, D, n, u) of some set of concrete objects 
D. In such a situation, for each concrete object d G D and abstract element S G 
such that the corresponding domains are related by the concretization function 
7: £)" — > p(-D), we write d oc d" and d p£ d" to denote the assertions d G 7(d") and 
d ^ 7(d tt ), respectively. For a lighter notation, we denote c", _lJ, T" and u" by C, 
_L, T and U, respectively. We also overload the symbols C, _L, T, U, 7 and a: the 
context will always make clear which incarnation has to be considered. 

The approximations of composite concrete domains are typically obtained by 
suitably combining the approximations already available for their basic components. 
For i = 1, 2, let Di be a set of concrete objects and consider the corresponding 
powerset lattice (p(-Dj), C, 0, Di, (~l, Ll); let also Z)f be an abstract domain related 
to p(Di) by the concretization function 74: £)? — > p(-Dj). 

6.1.1 Approximation of Cartesian Products. Values of the Cartesian product 
£?i x Z?2 can be approximated by elements of the Cartesian product D\ x D^. 
Namely, the component- wise ordered abstract domain (p\ x D\, C, _L, u) is related 
to the concrete powerset lattice (p(D\ x Z? 2 ), C, 0, Di x £) 2 , H, U) by the concretiza- 
tion function 7: (D\ x d|) -> p(L»i x D 2 ) defined, for each (d*,d 2 ) G £»? x d\, 
by 

7 (d},d 2 ) d = { (d 1; d 2 ) G Di x D 2 I di G 71(4).* G 72 (d 2 ) }. (77) 

Hence, (di, d 2 ) oc (d', d 2 ) holds if and only if di cx d} and d 2 cx d\. 

If the underlying approximations D\ and D 2 are both strict, then a better ap- 
proximation scheme can be obtained by adopting the strict product (also called 

7 Lct /: Di X • • • X D n — » Do, where (Di, Cj, _L;, Uj) is a bounded join-semilattice, for each i = 0, 
. . . , n. Then, function / is strict on the i-th argument if di = _L; implies f(di, . . . , d n ) = J-o- 
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smash product) construction, which performs a simple form of reduction by col- 
lapsing (d\, 4) to the bottom element whenever d\ — _L or d\ = _L. Namely, 



D\ D\ d = { (4,4) e D\ x D\ \ d{ = ± if and only if d\ = _L }. 

The concretization function is defined exactly as in (77). The constructor function 
■ ® •:(£>} x D\) -» (£>* D|) is defined by 

'(4,4), if 4 7^ -L and d\ ^ J_; 
_L, otherwise. 



4®4 d = f 



6.1.2 Approximation of Disjoint Unions. In order to provide an abstract do- 
main approximating sets of concrete objects drawn from a disjoint union, we use 
the following well-known construction several times. 

Suppose that Di<~)D2 = 0. Then, values of the disjoint union D = D1W-D2 can be 
approximated by elements of the Cartesian product = D\ x D\. In this case, the 
abstract domain £)" is related to the concrete powerset lattice (p(-D), C,0,D, D, u) 
by means of the concretization function 7: (D{ x d\) — + p{D\ tt) D 2 ) defined, for 
each (4,4) e D[ x Djj, by 

7(4,4) = f 7i (4) « 72(4)- 

Therefore, the approximation provided by is strict if both D\ and D\ are so. 
In order to simplify notation, if d\ 6 D\ then we will sometimes write d\ to also 
denote the abstract element (d\,±) 6 D"; similarly, d\ E D\ also denotes the 
abstract element (J-,4) € DK As usual, for each i = 1, 2 and e Z? i; the 
notation rfj oc ((i}, 4) stands for the assertion di € 7(4, 4), which is equivalent to 
di E 7i(4)- F° r the sake of clarity, the abstract domain £)" as specified above will 
be denoted by D\ ttl' d\. It is worth stressing that D\ w" 7^ D} th) £>^. 

6.2 Approximation of Integers 

The concrete domain of integers (p(Integer), C, 0, Integer, n, u) is correctly ap- 
proximated by an abstract domain (integer", C, _L, T, u), where we assume that 
7 is strict. Elements of Integer" are denoted by m", m", m\ and so forth. We 
assume that the partial abstraction function a: p(Integer) >— » Integer" is defined 
on all singletons {to} E p(Integer). We also assume that there are abstract bi- 
nary operations '©', '0', '0', '0' and '0' on Integer" that are strict on each 
argument and sound with respect to the corresponding operations on p(Integer) 
which, in turn, are the obvious pointwise extensions of addition, subtraction, mul- 
tiplication, division and remainder over the integers. More formally, we require 
7(toq to" ) 3 { m + mi | mo E 7(m"), m\ E 7(m") } for each to , to} E Integer", 
to ensure that '0' is sound with respect to addition. Likewise for '0' and '0' 
with respect to subtraction and multiplication, respectively. For the '0' oper- 
ation we require soundness with respect to integer division i.e., that, for each 
TO , to" E Integer", 7(m m\) D { to -f- toi | to E 7(m ), toi E 7(to"), toi ^ }. 
Likewise for '0' with respect to the 'mod' operation. We also assume there is a 
unary abstract operation, denoted by '0', which is strict and sound with respect 
to the unary minus concrete operation, that is, 7(0™") D { —to | to E 7(772") }. 
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6.3 Approximation of Booleans 

We assume a complete lattice (Bool", C, _L, T, n, u) is given that is related to the 
concrete domain of Booleans (p(Bool), C, 0, Bool, n, u) by means of a Galois con- 
nection where 7 is strict. Elements of Bool" are denoted by i", t\, t\ and so forth. 
We assume that there are abstract operations '0', '©' and '©' on Bool" that are 
strict on each argument and sound with respect to the pointwise extensions of 
Boolean negation, disjunction and conjunction over p(Bool). For instance, for the 
operation '©' to be sound with respect to disjunction on p(Bool), it is required that, 

7(4 © A) 2 { h V h I t G 7(4)' *i e 7(4) } for each 4 and A in Boolti - Likewise 
for '©'. For operation '0' to be sound with respect to negation on p(Bool), we 
require that, for each t" in Bool* 1 , 7(0 t") D { ->t | t £ 7(i") }. 

Furthermore, we assume that there are abstract operations '=', '<]', '>' 
and '>' on Integer" that are strict on each argument and sound with respect to the 
pointwise extensions over p(Integer) of the corresponding relational operators '=', 
't^', '<', '<', '>' and '>' over the integers, considered as functions taking values in 
Bool. For instance, for the operation '=' to be sound with respect to equality on 
p(Integer), we require that 7(m" = to") 3 { m = mi | to £ 7(to"),toi £ 7(to}) } 
for each to ,to" £ Integer". Likewise for '<', '<', '>' and '>'. 

6.4 Approximation of Storable Values 

The concrete domain of storable values (p(sVal), C, 0, sVal, n, u) , including both 

integers and Booleans, is abstracted by the domain sVal" = f Integer" l+l" Bool". The 
hypotheses on Integer" and Bool" imply that the approximation is strict. 

6.5 Approximation of Exceptions 

For the approximation of RTS exceptions, we assume that there is an abstract 
domain (RTSExcept", C, _L, T, u), which is related to the concrete powerset domain 
(p(RTSExccpt), C, 0, RTSExcept, n, U) by a strict concretization function. The 
partial abstraction function a: p(RTSExcept) >— > RTSExcept" is assumed to be 
defined on all singletons. Elements of RTSExcept" are denoted by x", Xcn Xi an d 
so forth. 

Generic exceptions, including both RTS exceptions and user-defined exceptions, 

are approximated by elements of the domain Except" = RTSExcept" w" sVal" . The 
hypotheses on its components imply that the approximation is strict. Elements of 
Except" are denoted by £q, £j and so forth. 

6.6 Approximation of Memory Structures, Value States and Exception States 

Here we differ from other published abstract semantics in that we explicitly cater for 
relational abstract domains as well as for attribute-independent ones [CC79]. While 
this complicates the presentation, it results in a truly generic abstract semantics. 
Moreover, the approach presented here is — all things considered — quite simple 
and reflects into a modular, clean design of the analyzer. 

Definition 6.1. (Mem", ValState", ExceptState".) We assume there exists an 
abstract domain (Mem", C,_L,u) that is related, by means of a strict concretization 
function, to the concrete powerset domain (p(Mem), C, 0, Mem, fl, u) . Elements of 
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Mem" are denoted by cr", <7q, <j\ and so forth. We assume that, for each a G Mem, 
there exists cr" G Mem" such that a oc cr" . 

The abstract domain of value states is ValState" = sVal" (g> Mem" . Elements of 
ValState" will be denoted by u", u", v\ and so forth. 

The abstract domain of exception states is ExceptState" = f Mem" ® Except". 
Elements of ExceptState" will be denoted by e", Eq, e\ and so forth. To improve 
readability, none" will denote the bottom element _L G ExceptState", indicating that 
no exception is possible. 

The abstract memory structure read and update operators 

■[-, •] : (Mem" x Addr x sTypc) -> (ValState" «" ExceptState"), 

•[• :=" •] : (Mem" x (Addr x sType) x sVal") -> (Mem" l±l" ExceptState") 

are assumed to be such that, for each cr" € Mem", a G Addr, sT G sType and 
sval" G sVal": 

j(J[a,sT]) D {a[a, sT] | crG7(cr")}, 
7(cr"[(a,sT) :="sval"]) D {a[(a,sT) := sval] | a G 7 (cr"), sval G 7 (sval") }. 
The abstract data and stack memory allocation functions 
new d ": ValState" ((Mem" x Loc) l±l" ExceptState"), 
new s " : ValState" -> ((Mem" x Ind) l±l" ExceptState") 

are assumed to be such that, for each v G ValState and w" G ValState" such that 
v G 7(f"), and each h G {d, s}: if new ^(v) — (a, a) (resp., new^('u) = e) and 
new/ l "(w") = ((cr",a'),£"), i/ien cr G 7(0-") and a — a' (resp., e G 7(2") 
T/ie abstract memory structure data cleanup function 

cleanup d " : ExceptState" — > ExceptState" 

is smc/i f/iaf, for each e" G ExceptState", we have 

7(cleanup d "(e")) 3 { cleanup d (e) | e G 7(2") }■ 

The abstract functions 

{markf, unmark", link", unlink"} G Mem" — > Mem" 

are defined to be such that, for each cr" G Mem": 

7(mark^ (cr")) D { mark s (cr) | cr G 7(cr") }, 

7(unmarkf (cr")) 3 {unmark s (a) | a G 7(0-") and unmark s (cr) is defined}, 

7 (link" (cr")) D { link s (cr) | cr G 7(cr") and link s (cr) is defined], 

7 (unlink" (cr")) D { unlink s (cr) | cr G 7(cr") and unlink s (cr) is defined}. 

It is assumed that all the abstract operators mentioned above are strict on each of 
their arguments taken from an abstract domain. 

As done in the concrete, the abstract stack unmark and the abstract frame unlink 
functions are lifted to also work on abstract exception states. Namely, for each 
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£« = (o-",^) G ExceptState", 

unmarkj^,^) =' (unmark^a 8 ), , 
unlink' (o-f,^) = f (unlink" (J),^). 

Besides the abstract operators specified above, which closely mimic the concrete 
operators related to concrete memory structures and exception states, other ab- 
stract operators will be used in the abstract semantics construction so as to enhance 
its precision. 

When dealing with Boolean guards during the abstract evaluation of conditional 
and iteration statements, it might be the case that no definite information is avail- 
able. In such a situation, the abstract execution can be made more precise if the 
abstract memory structure is filtered according to the condition holding in the 
considered computation branch. 

Definition 6.2. (Memory structure filter.) An abstract memory structure 
filter is any computable function <p: (Env x Mem* x Exp) — ► Mem" such that, for 
each e E Exp, each j3 : I with FI(e) C / and (3 h/ e : boolean, for each p € Env 
with p : (3 and each cfi E Mem", if (f>(p, cr$,e) — a\ t , then 

7(*tt) 2 {ott GMemj <JEj(J),ph p (e,a) -» (tt,<7 tt )}. 

Similarly, abstract exception states can be filtered according to whether or not 
they can be caught by the guard of a catch clause. 

Definition 6.3. (Exception state filters and selectors.) The abstract ex- 
ception state filters are computable functions 

4> + , 4>~ : (exceptDecl x ExceptState") — ► ExceptState* 

such that, for each p E exceptDecl and each e" E ExceptState", 

7(e"), ifp = any; 

7 (^+(p, £ »)) D I { (a,0 G 7 (e«) \£=p}, ifpe RTSExcept; 

{ (o~,Q £ li^) | £ G dom(type(p)) }, otherwise; 

0, ifp = any; 

7 (0-(p,e»)) D U (<j,0 G 7 (e«) \^ P }, ifpE RTSExcept; 

{ G 7(e") I £ ^ dom(type(p)) }, otherwise. 

The abstract memory structure and abstract exception selectors 
mem: ExccptState" — > Mem", 

sel: (cType x ExceptState") (RTSExcept 8 W Integer 8 W Bool J ) 
are defined, for each e" = (a", (x", (m",t"))) G ExceptState" tmd cT G cType, fry 

/ »x def J 

mem(£") = cr"; 

i/ cT = rts_exception; 
sel(cT, e 9 ) = \rr$, if cT = integer; 

if cT = boolean. 
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To simplify notation, we will write cT(e") to denote sel(cT, e"). 

The generic specification provided above for abstract memory structures and 
the corresponding abstract operators plays a central role for the modularity of the 
overall construction. By exploiting this "black box" approach, we achieve orthog- 
onality not only from the specific abstract domains used to approximate (sets of 
tuples of) storable values, but also from the critical design decisions that have to 
be taken when approximating the concrete stack, which may be unbounded in size 
due to recursive functions. Hence, while still staying in the boundaries of the cur- 
rent framework, we can flexibly explore, combine, and finely tune the sophisticated 
proposals that have been put forward in the literature, such as the work in [JS03; 
JS04], which encompasses both the functional and the call string approaches to 
interprocedural analysis [CC77b; SP81]. 

6.7 Abstract Configurations 

Terminal and non-terminal configurations of the abstract transition system are now 
defined. 

Definition 6.4. (Non-terminal abstract configurations.) The sets of non- 
terminal abstract configurations for expressions, local and global declarations, state- 
ments, function bodies and catch clauses are given, for each G TEnv/ and respec- 
tively, by 



rf d ^ f { (e, cr») e Exp x Mem 8 | 3sT 6 sType . hj e : sT }, 
rf d = { (d, cr") G Decl x Mem 8 | 36 G TEnv .pHd-.S}, 
rf d = { (g, G Glob x Mcm» | 36 G TEnv . hj g : 6 }, 
If* d = { (s, G Stmt x Mem* | hj s }, 

lf s d = { (body, cr") G Body x Mcm» | 3sT G sType . hj body : sT }, 
rf d = { (fc,e J ) G Catch x ExceptState* | hj k }. 



We write iV" to denote a non-terminal abstract configuration. 

The approximation relation between concrete and abstract non-terminal config- 



Definition 6.5. (Terminal abstract configurations.) The sets of terminal 
abstract configurations for expressions, local and global declarations, statements, 
function bodies and catch clauses are given, respectively, by 




N oc ( qi = q 2 A a cx cr"). (78) 

For each N = (fci,e) G if and iV" = (k 2 ,s*) G if* , 




(79) 



T| d = ValState* ExceptState*, 

T\ d = Tj d = (Env x Mem 8 ) W J ExceptState" , 
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Tj d = Tl d = Mem* w" ExceptState", 

T k " d = T| w" ExceptState". 

We write rf to denote a terminal abstract configuration. 

The approximation relation n oc rf between concrete and abstract terminal con- 
figurations is defined as follows. For expressions, 



i tt tt\ ) V( * V K ifrj = v; , . 

rj oc (v\e*) <^ (80) 

e oc e w , if V = £ - 



For local and global declarations, 

I, th «\ /(Pi = P2 Act oc <t"), if rj = (p u a); 

I £ OC £ p , lft] = £. 

For statements and function bodies, 



,oc( CT «£«>^ ^< ^ = ^ 
e oc £ p , ift] = e. 



For catch sequences, 

eoc£', if rj = (uncaught, e). 

The approximation relation for sequents is trivially obtained from the approxi- 
mation relations defined above for configurations. 

Definition 6.6. ('oc' on sequents.) The approximation relation between con- 
crete (positive and negative) sequents and abstract sequents is defined, for each 
(3 £ TEnv/, for each po,Pi £ Envj such that po : (3\j and p\ : (3\j, for each 
q£ {e,d,g,s,b,k}, TV GTf, 7]GT q , N* 6T« andrpeTl, by 

(p \- p N -> r/) oc (pt \-p Af" -> r?") 4=> (p = pi A N oc N" A n oc r?"); (84) 

(po h? N -2^) oc (pi h 7V» -» 77") (po = pi A iV oc JV»). (85) 

6.8 Supported Expressions, Declarations and Statements 

Each abstract domain has to provide a relation saying which (abstract configura- 
tion for) expressions, declarations and statements it directly supports, as well as 
an abstract evaluation function providing safe approximations of any supported 
expressions, declarations and statements. 

Definition 6.7. (supported", eval".) For each q £ {e, d,g,s} ; we assume there 
exists a computable relation and a partial and computable operation, 

supported" C Env x and eval" : (Env xT^Jw T\, 

such that whenever p : (3 and supported" (p, AT") holds, eval"(p, AT") is defined and 
has value rf £ and, for each N £ and each rj £ T q such that N oc AT" and 
p hp N — > rj, we have rj oc rf . 
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An appropriate use of 'supported"' and 'evar' allows the design of the domain 
of abstract memory structures to be decoupled from the design of the analyzer. 
In particular, it enables the use of relational as well as non-relational domains. 
For example, using the domain of convex polyhedra the proper way, one can 
easily implement a safe evaluation function for (the non-terminal abstract con- 
figuration of) any affine expression e. As a consequence, one can specify the 
support relation so that supported" (p, (e, cr"}) holds. Similarly, one can specify 
supported* (p, (id := e,cr*)) holds for any affine assignment, i.e., an assignment 
where e is an affine expression. Other implementation choices are possible. For 
instance, besides supporting affine expressions, the implementer could specify that 
supported' (p, (idi * id 2 , cr"}) holds provided p : I, idi, id 2 € I and, for at least one 
i G {1,2}, 7((T I) [p(idj)l) = {m}, for some integer value m. Similarly, the design 
can impose that supported" (p, (id * id, cr")) always holds. 

6.9 Abstract Evaluation Relations 

The abstract evaluation relations that provide the first part of the specification of 
the abstract interpreter for CPM arc now defined. These relations are of the form 

where (3 e TEnv, p : (3 and, for some q e {e, d, g, s, b, k}, N% e T f ^ and G Tjf. 
The definition is again by structural induction from a set of rule schemata. In 
order to allow for the arbitrary weakening of the abstract descriptions in the con- 
clusion, without having to introduce precondition strengthening and postcondition 
weakening rules, and to save typing at the same time, we will use the notation 

Po ■ ■ ■ Pt-i 
j (side condition) 

P h? 7V» ~» 4 

to denote 

-PO ' ' ' Pi—l u 

j (side condition) and rf C 77" 

p 1-/3 N* — > rf 

where 'C' is the natural ordering relation on the appropriate abstract lattice (i.e., 
one of the T|, for q e {e, d, g, s, b, k}. 

Recalling the shorthand notation introduced in Section 6.1.2, when an abstract 
storable value sval" is expected and we write an abstract integer m" or an abstract 
Boolean t", then we are actually meaning the abstract storable value (m",_L) or 
(-L,t"), respectively; similarly, when an abstract exception £} is expected and we 
write an abstract RTS exception x" or an abstract storable value sval" , then we are 
actually meaning the abstract exceptions (x">-L) or (-L)Sval"), respectively. 

6.9.1 Unsupported Expressions. The following rules for the abstract evaluation 
of expressions apply only if supported" (p, (e, cr"}) does not hold, where e is the 
expression being evaluated. This side condition will be left implicit in order not to 
clutter the presentation. 
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Constant. 



p \-p (con, cr") ~+ (a({con}) er', none") 
Identifier. 



(id,<7«)^<7» [p(id)] 

Unary minus. 

php (e,<7»)^((m»,(7 ), g ») 

ph/3 (_ e , f7 tt)^(( em », (7 ») )£ tt) 



(86) 



(87) 



(88) 



Binary arithmetic operations. Let @ G {+,—,*,/,%} be a syntactic operator 
and ® e {©, Q, 0, 0, 0} denote the corresponding abstract operation. Then the 
abstract rules for addition, subtraction, multiplication, division and remainder are 
given by the following schemata: 



pVp (en, a 8 ) -> ((m , 0-^,4) P^(i (ei,<r§) -> ((mf, gj), 4) 
P 1-/3 (eo @ ei,cr s ) ~» ((m ® m},cr}),4 U e\) 

if £ {/,%} or ^ m}. 

(e ,o- tt ) -> ((mg,,g^),4) ph,3 (ei,crg) -> ((mf, a{), e\) 
P 1-/3 (eo @ ei, cr" 



(89) 



(90) 



if @ G {/, %}, oc to* and = o\ a({divbyzero}). 

Arithmetic tests. Let EB € {=, ^, <, <, >, >} be an abstract syntax operator and 
let cxi: (Integer* x Integer") — > Bool* denote the corresponding abstract test oper- 
ation in {=,^, <,<!,[>,>}■ Then the rules for the abstract arithmetic tests are 
given by 

p^p (eo.u 1 ) -» <(m ,cr^),e ) ph,, (e^crg) -» ( (m} , <4 ) , e\ ) 



p \~p (e ffl ei, ct") ~» ((m ix to}, o-J), e U ej) 

Negation. 

(not &,<t») - ((0i»,a o ),£») 
Conjunction. 

phff (oo.g 11 ) -> (f ,4) P^P (oi,<4t) (^1>4) 

p \-p (b and 6 l5 <t«) ~» U u},4 U 4) "' 

if o"tt = <?Kp> ^ M> <4 = 0(Pj o-", not & ) and = a({ft}) o\. 
Disjunction. 

phff (00,0-") -> (t^4) P^ (oi,o-ff) -> v4,4) 
P^ (bo or6i, CT ») - ( W f t U W J,4u4) 



(91) 



(92) 



(93) 



(94) 
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if °~tt = <f>(p, cr', b ), cr^ = 4>(p, cr', not b ) and v\ t = a({tt}) <g> <r t V 

6.9.2 Unsupported Declarations. The following rules only apply if the condition 
supported' (p, (q, cr")) does not hold, where q E Decll+lGlob is the declaration being 
evaluated. Again, this side condition is left implicit. 



Nil. 



p h/3 (nil, cr") ~> ((0,o- tl ),none ti ) 
Environment. 

p \-p (po,^) ~* ((po, cr^^one^ 
Recursive environment. 

p h/3 (recpojO-") ~» ((pi, cr 8 ), none*) 

if pi = { id po(id) | po(id) = Afps . extern : sT } 

{Vi G {0, 1} : absi = Afps . letd, ins result e, 
id i ► absi 
po(id) = abs ,rfi = rec(p \ Dl(fps)) ; d 

Global variable declaration. 

phff (e,gt)^(v\ei) 

p \-p (gvarid : sT = e,J) ~» ((pi, a[ ), cleanup^ef, U e\)) 

if ncw d »(u») = ((a\,l),e\) and p a = {id (i,sT)}. 
Local variable declaration. 

P^ (e,o-t)^(v\ej) 

p h/3 (lvar id : sT = e, cr") ^ ((pi, o - }), unmarkf (e U ej)) 

if new B «(u») = ((a},i),£ tt i) and pi = {id (i,sT)}. 
Function declaration. 

p h/3 (function id (fps) = body ,cr) ~+ ((po, <r"), none") 



(95) 



(96) 



(97) 



(98) 



(99) 



(100) 



if po = {id i — > A fps . bodyj} and either body = body 1 = extern : sT or, for each 
i E {0, 1}, body, = let in s result e, I = FI(body ) \ Dl(fps) and d\ = p\j] do- 
Recursive declaration. 

(P\J) h /3[/3i] (9,^) -> ((po, erg), none") p h^ (recp ,<7*) -> if 



p h/3 (rec g, cr") ~+ 77' 
if J = FI( 3 ) n DIG?), P ^Fi(g) g ■ Po and f3 1 = (3 \j- 



(101) 
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(102) 



Global sequential composition. 
P^p (9o,^) -> ((po,<4),4) p[pa]\-p[p a ] (ffi, -> ((pi, cr{),e[) 

p ha (go;gi,^) ((p [pi],<t5),4 u 4) 

if /3 h 7 5o : A) and FI(. 9o ) C 7. 
Loca/ sequential composition. 

phg (rioV) -> ((p ,g^),£ ) pM I-/3[/3q] (rfi,trg) ^ ((pi,trj),4) 

p 1-/3 (dn^i,^) ((p [pi],crf),£ Ue}) 

if /3 h/ d : /?o and FI(d ) C I. 

6.9.3 Unsupported Statements. The following rules only apply if the implicit 
side condition supported' (p, (s,<r^)) does not hold, where s is the statement being 
evaluated. 



(103) 



Nop. 



(104) 



if <7 [p(id) :=» sval 8 ] = (a},£?) (105) 



(106) 



(107) 



p (nop, cr") cr" 

phff (e,<7»)^((sval»,ag),4) 
Ph <id:=e,(7») - <(r},4ue}) 
Statement sequence. 

pVp (s ,o- tt ) -> (a^4) Ph (si,<4) -> (^,4) 
p h,3 (sojsi,^) ~» (a}, £ Ue}) 

phg (^mark^cr 11 )) -> ((p ,^),£ ) p[p ] (s,crjj) -> (o\M) 

php (d;s, ct") ~» (unmarkf (erj), e U unmarkf (4 )) 
if /? l-Fi(d) <i : /? - 

P V !} (e.o-") -» (u ,4) P h /3 ( s o,crft) -> (<>i e i> 

p^( Sl ,4)-(a»,4) (108) 



p h-0 (if e then sq else si, <r") ~~> (a\ U erJ; , 4 ^ 4 ^ e 2) 

if a\ t = <fr(p, cr" , e) and crj^ = 0(p, , not e) . 
While. 

Pha (e.o-") -> (u§,4> P V P ( s >°tt) -> (<4>4) 

ph^ (while e do s,^) -> (<r»,4) ( 109 ) 

p h-0 (while e do s, cr'} ~-> (p\ U a', 4 Ue{u 4) 

if erjj t = </>(p, cr" , e) and erj^ = (p(p, , not e) . 
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Throw. 

if £ t = a t (g, a {{ X }) (110) 



pl-p (throw xV) ~» (±,£ tt ) 
ph p (e,a»)^((sval«,<7 ),4) 



p (throw e, J) ~» (_L, 4 U e\) 
Try blocks. 

pVn (s,^ 11 ) -> (q-^4) (Ml) -> ((<4,4)>4) 

p (try s catch fc, u") ~-> (<Tq U ct^, 4 U 4) 



if e » = CT J <g> sval* (111) 



(112) 



(try s finally 8l ,a*) (a», 4 U 4 U (trg ® £?)) 

Function call. With reference to conditions (60) and (61) of the concrete rules 
for function calls, the corresponding abstract rule schema is 

p^p (d, markka 11 )) ((p , <4), 4) 

h /?[/3i] (body, link" (<rg)) (<4,4) 
P[go] h /3[/3o] ( id o : = x ,unlinkf(o-})) -> {a\,e\) 



(115) 



p 1-/3 (id := id(ei, . . . ,e„),cr tl ) ~» (unmarkf (ct|), £ J ) 
if (60) and (61) hold and e" = £q U unmarkf (unlink^ (4)) U unmarkjj(4)- 
6.9.4 Function Bodies. 

p \-f} (d,mark*(cr J )) ((p ,<4)>4) 
p[Po] ^/3[/3 ] (s,4) (°i>4) 

pM ^/3[/3q] (x := e,a[) -> (4,4) 

p\-p (letd ins result e, <r") (unmarkf (o\), 4) 

if /3 h FI ( d ) c? : /3o, £3 = e U unmarkj!(4 U 4)- 



p (extern : sT, J) - (<7», (a», T)> (U6) 

if Ver, cto £ Mem : (<r = (/x, to) A er oc cr" A (To = (po, w)) =^> <t k °o- 
6.9.5 Catch Clauses 
Catch. 

p h fJ (s,mem(4)) -> Vi 



php ((p)s,£«) (»?1,4) 



(117) 
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if p = any or p = \ or p = cT, 4 = <fi + {p, e") and e\ = 4>~(p, e"). 
p[{id^> (i,sT)}] h 0[{id ^ sT loc}] {s,a\) -> (4,4) 



(118) 



ph /3 ((id:sT) S , £ »)-<(a|,4),4) 

if4 = ^+(sT,e«),e« =r(sT, £ »),new s »(sT(e»),mark»(mem(4))) - ((<4,i),4) 

a\ = unmarkf (CT3) and £4 = unmark|(4) LI unmarkj!(4)- 
Catch sequence. 

P^p (fc ,e J ) -> ((crjj,4),4) Pha (^1,4) -> (<>i>4)>4} 



(fco;fci,e*> ^((<7»Uaf,4u4),4) 



(119) 



6.9.6 Supported Expressions, Declarations and Statements. Let g G {e, d,g,s} 
and iV" G T^". Then, whenever supported 1 * (p, TV" ) holds, alternate versions of the 
rules above apply. For each of the rules above, 

j — if (side condition) and not supported" (p, TV") 

we also have the rule 
_Pp * * ■ Pi 1 

if (side condition) and supported" (p, iV") 



ph^, TV" ~-> eval^p,^) 

Notice that even if eval*(/9, A 7- ") does not depend on the rule antecedents Pq, ■ ■ ■ , Pe-i, 
these cannot be omitted, as this would neglect the sub-computations spawned by 
the unsupported evaluation of NK 

6.10 Abstract Semantics Trees 

We now define possibly infinite abstract semantics trees along the lines of what 
we did in Section 5.7. Notice that the need to consider infinite abstract trees goes 
beyond the need to observe infinite concrete computations. For instance, there is 
no finite abstract tree corresponding to a program containing a while command, 
because (109) is the only abstract rule for while and it recursively introduces a 
new while node into the tree. 

Definition 6.8. (Abstract semantics rules.) The set V} of abstract seman- 
tics rules is the infinite set obtained by instantiating the rule schemata of Section 6.9 
in all possible ways (respecting the side conditions). 

Let iS* be the (infinite) set of sequents occurring in the premises and conclusions 
of the rules in IZK Matching Definition 5.6, the abstract semantics universe, denoted 
by 14^ , is the set of finitely branching trees of at most aj-depth with labels in <S" . 

Definition 6.9. (Abstract semantics trees.) Let T^: p(W) — > p(U*) be 
given, for each U$ 6 p(W), by 

L s s > 

The set of abstract semantics trees is 9" = f gfp c (^). 
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We now show that, for every non-terminal abstract configuration, there exists an 
abstract tree with that in the root. 

Proposition 6.10. For each (3 G TEnv ; p G Env such that p : (3 and G T^, 
where q G {e, d, g, s, b, k}, there exists 0" G 9" such that, 

0«(e) £ {(php N^rf) | ?7» eT»}. 

Proof. For the proof, let 8 

S\{p,(3,N^) d = { s tt | s tt = (p hp rfl), (s oc s i s is well-typed) }. 

We now assume that iV" £ is a fixed but arbitrary non-terminal abstract 

configuration. Suppose that supported' (p, AT") does not hold. By inspecting the 
abstract evaluation rules given in Section 6.9, it can be seen that there exists 
I > and a nonempty set of rules P G V} with t premises and a conclusion 
in S+(p, f3, iV"). If, on the other hand, supported" (p, iV") does hold, then it follows 
from Section 6.9.6 that, by Definition 6.7, eval" (p, JV") is defined and, for each 
rule in P , there is a rule with the same set of premises but where the conclusion 
(p \- fj N$ -> eval tt (p, iV")) is also in /?, JV*). Thus, in both cases, by definition 

of W", there exists a tree in W with root in Sl(p,f3,N$). 

We prove that, for any n G N, there exists a tree J G .p""^) such that 0"(e) G 
Sl(p,/3,N*). To this end, we reason by induction on n > 0. In the case n = 0, 
U = so that the hypothesis holds. 

We now suppose that n > 0. Let j G {0, ■■■,£} be the maximal value for which 
there exist trees 0§, . . . , 0^_i G ^("-^(W") where P = 6» (e), . . . , P,_i = 0^_i(e) 
are the first j premises of a rule in Pn; let Rj C P n be the set of all rules in Pn 
with Pn, . . . , Pj-i as their first j premises; then Rj ^ 0. We assume that j < I 
and derive a contradiction. By inspecting the rule schemata in Section 6.9, it can 

be seen that, if there exists — jf- — - — G Rj for some Pj G S+(pj,fij,Nj) and 

s'» G & + (p,0,N*), then 

VP, G Slfa, ft, Nj) : 3s» G Sl(p, /?, JV») . P ° ' ' ' P ^ Pj ' ' ' G Rj. (120) 

By the inductive hypothesis, there exists e) G T^- 1 ) such that P,- = 0|j(e) G 

S+(Pj, iV|); hence, by (120), there must be a rule in Rj whose (j + l)-th premise 
is Pj\ contradicting the assumption that j < I is maximal. Hence j = I. Thus there 
exists a rule — "Z^" 1 G Pn for some s" G S|(p,/3,7V"); hence, by Definition 6.9, 

the tree ^ "J*' 1 G ^"(W"). Therefore since, by Definition 6.9, 6 s = gfpc(-P tt ), 
there exists a tree 8$ in 6 s such that s (e) G S|(p, /?, iV"). □ 

7. CORRECTNESS OF THE ABSTRACT SEMANTICS 

In Section 6, we introduced the notion of sound approximation for configurations 
and sequents in terms of the concretization function 7 defined for each abstract 
domain. We now proceed to define the notion of sound approximation for trees. 



For the definition of a well-typed sequent, see the proof of Proposition 5.8. 
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p(0 x 6") be given, /< 



Definition 7.1. ('oc' for trees.) Let oc : p(0 x 6") 

eacft {/ G p(0 x 9 s ), 6j/ 

0(e)oc0»(e), 
Vi G dom(0) n N : 

3jedom(0»)nN. (%,^.]) etf 1 

Then 6oc6Uf and only if (0, 0») G gfp c («)- 



or 



a(t/) 



def 



(6»,6» J ) e e x e J 



In words, oc 0" means that the root of 9 is approximated by the root of 0* and 
every immediate subtree of 9 is approximated by some immediate subtrees of 0". 
Notice that one immediate subtree in 0" may be related by 'oc' to none, one or more 
than one immediate subtree of 9. 

The following result states that, for each concrete tree, there is always an abstract 
tree that is generated from a corresponding non-terminal abstract configuration. 

Theorem 7.2. Let 9 G be a concrete tree such that 9(e) = (p hg N — > 77) or 
9(e) = (p \- N -22-»). Then there exists 6* G J such that, 0»(e) = (p JV» 778) 



N oc iV". 

Proof. Suppose first that N = (q, a) where q G {e, d, g, s, b}. By Definition 6.1, 
we can always find cr" £ Mem" such that a oc er". Hence, letting Af" = (q, a*), 
by (78) in Definition 6.4, we obtain TV oc AT". Next suppose N = (k,e), where 
e = (a, £). As before, by Definition 6.1, we can always find cr" G Mem" such that 
d oc <t" . Moreover, by the definition of the approximation for exceptions, we can 
always find £ J G Except* such that £ oc £ J . Hence, letting Af" = (fc, cr' ® £f), by (79) 
in Definition 6.4, we again obtain A^ oc Af". In both cases, by Proposition 6.10, 
there exists an abstract tree 0" such that 9 i (e) = (p \-p N* 77") and N oc NK □ 

The next result states that our abstract rules only generate abstract trees that 
are correct approximations of their concrete counterparts (i.e., concrete trees rooted 
with the same statement, the same environment and initial memory structure). 

Theorem 7.3. Let e and 0" G O* be such that 9(e) = (p N -> 77) or 
0(e) = (p h/3 JV-^») and 0»(e) = (p N* -» 77") , w^ere JV oc JV«. T/ien oc 0». 

Theorem 7.3 is a trivial corollary of the following 

Proposition 7.4. Let 

'9(e) e{ph p N^ V , ph p N^}, 
6^(e) = p^ f3 N^rf, 
N oc AT» 



(0,0 J ) 60x0» 



(121) 



Then, for all (6,9*) G 5, oc 0». 
Proof. Let G and 0» G s . We define: 
9 [0] (e) ••• [ , l _ 1] (e) 



def ' 



def 1 



(e) 



0(e) 



where, for some ft,, £ > 0, {0, . . . , h — 1} C dom(0), {0, . , 



1} C dom(0»), 



h i dom(0) and I £ dom(0»). By Definitions 5.7 and 6.9, r£K and r" <E 1ZK Note 



On the Design of Generic Static Analyzers for Modern Imperative Languages • 45 



that, to simplify the proof, we will use the schematic concrete and abstract rules 
given in Sections 5.5 and 6.9 to denote the actual rule instances r and r". 

Letting (6,9$) G S, we need to show that 9 oc 9$; by Definition 7.1, this is 
equivalent to showing that (9,9$) G gfp c (oc). To this end, by the principle of 
fixpoint coinduction, we will show that (9,9$) G oc(S'). 

By Definition 7.1, we need to show that the following properties hold: 

(i) 0(e) <x 0(e); 

(ii) for each i = 0, . . . , h — 1 there exists j G {0, ...,£— 1} such that (9^ , #?.,) G 5. 

The proof that properties (i) and (ii) hold is by (well-founded) induction on the 
structure of the concrete tree 9. Observe that the "immediate subtree" relation 
between trees in + is a well-founded partial ordering because, if 9 G + then, 
by Definition 5.7, there are no infinite descending chains. We extend this ordering 
relation to the immediate positive subtree relation between trees in 0: 9' is said to 
be an immediate positive subtree of 9 if and only if 9' G 0+ and is an immediate 
subtree of 9. Clearly, by Definition 5.7, the immediate positive subtree ordering on 
trees in is also well-founded. 

We first note that it is not restrictive to only consider unsupported expressions, 
declarations or statements: as noted in Section 6.9, the tree for any supported 
expression (resp., declaration or statement) has the same structure as the tree for 
the same expression (resp., declaration or statement) as if it were unsupported. 
Hence, once correctness of the approximation for unsupported expressions, declara- 
tions or statements is proved, the correctness for their supported counterparts will 
immediately follow from Definition 6.7. 

Let 

9(e) = (p hp N - v ) or 9(e) = (p h (j N -^) , 
9 s (e) = (p V S3 N$ ~* r/ J ). 

By (121), N oc NK Therefore, by condition (85) of Definition 6.6, property (i) holds 
trivially whenever 9 G 0- (i.e., when r is a negative concrete rule). In addition, to 
prove that property (i) holds for each 9 G 0+ (i.e., when r is a positive concrete 
rule), by condition (84) of Definition 6.6, we just need to show i] oc r]K 

Consider next property (ii). The base cases are when the concrete rule r has 
no premises (i.e., h = 0); and this property holds trivially in these cases. For the 
inductive steps (i.e., h > 0) suppose i G {0, . . . , h— 1} and j G {0, . . . , £ — 1} are such 
that (6[q , ) G S. If 9 G 0+ then, by the inductive hypothesis, we can assume that 

(0[i]>0[j]) € oc(S); similarly, if 9 G 0- and i ^ h — 1, by the inductive hypothesis, 

we can assume that, (9^,9$^) G oc(5). Hence, in both cases, by Definition 7.1, 

9[i](e) oc ^Li(e)- Also, if 6 G 0_, by Definition 5.7, 9[ h _^(e) is a divergent sequent 

so that, by Definitions 6.6 and 7.1, 9[ h _i](e) oc 6^(e). Thus, for all concrete trees 
9 G 0, we can safely assume the following: 

ViG{0,...,/ l -l},iG{0,...,£-l}:(%,^. ] )GS =*► f? [i] (e)oc^. ] (e). (122) 

Moreover, we need only explicitly prove property (ii) for each of the positive rules 
since, by the definition of the concrete divergence (negative) rules, (122) and Dcfini- 
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Table I. Corresponding concrete and abstract rules and terminals for expressions 



J7J 


V 


A 

1 


Ve 


rf c = ((sval|,er|),e*) 










(svalJ,oi) 


4 


con 


2 


86 


(con, 0) 


a({con}) (g) cr' 


none° 


id 


3 


87 




(T«[p(id)] 


— e 


4 


88 




(emtt,<T») 


£» 




5 




(-m, (to) 






e @ ei 


6/7 


89 


£ 


(m ©m^o-J) 


£»U£» 






90 




(mi @m\,u\) 


£»U£»U4 




8 


89 


(mo mi, (Ti) 


(m ®m},(r}) 


£»□£» 






90 




(m Q © m' , (Tj) 


£ U£»U4 




9 


90 


(ei, divbyzero) 


(m Q © m\, <t\) 


4 u 4 u 4 


mo 11 mi 


10/11 


91 


£ 


1 i a a x 
(mj IX mj,(Tj) 


4 U 4 




12 




(mo ^ mi,(Ti) 






not b 


13 


92 


£ 


(G t«,<r») 


e« 




14 




(-■tjO-o) 




bo and b\ 


15 


93 


£ 




4U£» 




16 




<ff,<T ) 








17 




»? 






bo or 61 


18-20 


94 


Similar to the rows for 'bo and fei' 



tion 6.4, if property (ii) holds for any positive rule it also holds for the corresponding 
negative rules. Thus in the detailed proofs of properties (i) and (ii) for the inductive 
steps, we only consider the positive rules. 

To help the reader, Tables I, II, III, IV and V, contain a summary of the con- 
clusions of rules r and rK The first column Q G {E, D, G, S, B, K }, gives the 
syntactic forms in the first component of the non-terminal configurations N and 
TV" (which, by Definition 6.4, must be the same); the second and third columns 
give a concrete rule r and abstract rule r", respectively, that apply to Q. Note that 
we do not pair concrete rules with abstract rules that have mutually inconsistent 
side conditions. Justification for the omission of any abstract rules for a particular 
concrete rule r is given in the detailed proof for that case. The column headed r] q , 
where q £ {e, d, g, s, b, k} gives the concrete terminal configuration for r, while the 
columns headed by vp q give the components of the abstract terminal configuration 
for rK A blank entry in any table cell means that the value is exactly the same as 
the value found in the same column of the previous row. To save space in Tables II, 
III, IV and V, we have denoted the operations 'cleanup d ', 'unmark s ', 'unlink s ', 
'unmarkf' and 'unlink'' by 'cua', 'um s ', 'ul s ', l um s "' and 'ul s "', respectively. Note 
that the premises and the side conditions for the rules are not provided in any of 
the tables; reference must be made to the actual rules for this information. 

7.1 Expressions 

For this part of the proof, we use Table I. By (121), N oc N*. Thus letting 
N = (E,a) and TV" = (E, cr"), by Definition 6.4, we have the implicit hypothesis 
a oc aK We show using (80) in Definition 6.5, that rj c oc 7j|. 

Constant. Suppose r is an instance of (2). By definition of a: p(Integer) >— » 
Integer' and a: p(Bool) >— ► Bool", we have con oc a({con}); by hypothesis, a oc er' 
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so that (con, a) oc a({con}) <g> a*. Hence r] e oc r)\. 

Identifier. Suppose r is an instance of (3). Since, by hypothesis, a oc cr", by 
Definition 6.1 we obtain <r[p(id)] oc cr" [p(id)] . Hence, r\ c oc rf c . 

Unary Minus. Suppose r is an instance of (4) or (5). Then, by hypothesis, 
(#[ ],#f ]) £ S and, hence, as h = 1, property (ii) holds. By (122), 0[o](e) oc ^(e). 
Thus, if r is an instance of (4), then e oc e"; if r is an instance of (5), then m oc m" 
and erg ^o- ^ ne latter case, by the soundness of '0', — m oc QmK Hence, in 
both cases, rj e oc ry|. 

Binary Arithmetic Operations. Suppose that r is an instance of one of the rules 
(6)-(9). Then, by hypothesis, (0[ O ],0[ O] ) G S\ By (122), [o] (e) oc 6»[ 0] (e). Note that, 

in the condition for abstract rule (90), e\ = a\ ® a({divbyzero}). 

If r is an instance of (6), then ft, = 1 so that property (ii) holds. The property 
0[o](e) oc 0[n](e) implies e oc e . Therefore, 9 r] c oc r/|. 

If r is an instance of (7), (8) or (9), then h = 2. Property #[o](e) oc #' ](e) implies 
(To oc (Tq an( i m o oc m ; hence (^[l],^]) G 5 and property (ii) holds. By (122), 
% ] (e)oc0[ 1] (e). 

If r is an instance of (7), then property %](e) oc SL(e) implies e oc e\; thus 

rj e oc If r is an instance of (8), then property 0[i](e) oc 0Hi(e) implies o\ oc rrj 

and mi oc m' so that, by the soundness of '©', (mo @ mi) oc (m © m}); and 
hence r] c oc r/|. If r is an instance of (9), then the condition 0[i](e) oc ^(e) implies 

(Ti oc ct} and oc m\. Hence, by the side conditions, r" must be an instance of (90); 
so that, as (<n , divbyzero) oc a \ <g) a({divbyzero}), we have r] e oc 

Test Operators. Suppose r is an instance of one of rules (10)-(12). Then, by 
hypothesis, (0 [o] ,0[ o] ) G 5. By (122), 6 [0] (e) oc fl[ 0] (c). 

If r is an instance of (10), then h = 1 and property (ii) holds. 0[o](e) oc #' ](e) 
implies e oc e . Hence ?7 C oc tj|. 

If r is an instance of (11) or (12), then h = 2. 0[ O ] (e) oc 6^ (e) implies (To oc a\ and 

m oc m . Thus (6*[ij , 6*^ ) e 5 and property (ii) holds. By (122), 0[i](e) oc ^(e). 

If r is an instance of (11), then e oc e'; and if r is an instance of (12), <ri oc o~\ and 
mi oc m} so that, by soundness of W, (m ^ mi) oc (m IX mj). Hence, for both 
concrete rules, -q c ocrfe. 

Negation. The proof when r is an instance of (13) or (14) has the same structure 
of the proof for the unary minus case shown before. 

Conjunction. Suppose r is an instance of one of rules (15)— (17). By hypothesis, 

(%,],<]) gS. 



9 Here and in the following, whenever we need to prove ta ij UiJ, we just prove either l oc or 
tocij and implicitly use the monotonicity of 7. 
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Table II. Corresponding concrete and abstract rules and terminals for declarations 



Q 


r 


r tt 


»7q 


»?q = ((PL<^S),4) 






nil 


21 


95 


(0, a) 


(0,CT») 


none'- 


Po 


22 


96 


(P0,f) 


(P0,cr») 


none'* 


rec po 


23 


97 


(pi,c) 




none'' 


gvar id : sT = e 


24/25 
26 


98 


cu d (e) 

(Pl,Cl) 


(pi,^) 


cu d tt( £ » Ue») 


lvar id : sT = e 


27/28 
29 


99 


um s (e) 

(Pl,Cl) 


(pi><Hl) 


um B tt( £ »U £ ») 


function id(fps) : sT = e 


30 


100 


(po,c) 


(po,<t») 


nonef 


rec g 


31 


101 


V 




go; 91 


32/33 
34 


102 


e 

(po[pi],0l) 


(po [pi], o-J ) 




d ; di 


35-37 


103 


Similar to the rows for 'go; <?l' 



If r is an instance of (15) or (16), then h = 1 and property (ii) holds. If r is 
an instance of (15), by (122), we have #[o](e) oc 0jL(e), which implies e oc ejj. If r 
is an instance of (16), by Definition 6.2, er oc ct^ = cp(p, cr", not 6n). Thus, since 
ff oc a({ff}) holds by definition, we have (ff, <7n) cx ujj. Hence, for both concrete 
rules, i] c oc ??|. 

If r is an instance of (17), then h = 2. By Definition 6.2, ctq °c erf t , so that 
(0[i]> ^[i]) £ 5* an d property (ii) holds. By (122), 0[i](e) oc ^(e) so that n oc (v[, e\). 
Hence, -q e oc r?|. 

Disjunction. The proof when r is an instance of one of rules (18)-(20) is similar 
to that for conjunction. 

7.2 Declarations 

In Table II, Q denotes a local declaration D or a global declaration G. Moreover, 
Vq £ {Td,T g } and n q G {T*,T|}, the actual domains for ?7 q and 77^ will depend on 
context. 

By (121) we have N oc NK Thus letting N = (Q,a) and = (Q,cr s ) for any 
Q G {D, G}, by Definition 6.4, we have the implicit hypothesis a oc aK We show 
using (81) in Definition 6.5, that rj q oc 77^. 

ML If r is an instance of (21) then, by the hypothesis, rj q oc r] q . 

(Recursive) Environment. If r is an instance of (22) or (23) then, by the hypoth- 
esis, T] q OC T] q . 

Global Variable Declaration. If r is an instance of one of rules (24)-(26) then, 
by the hypothesis (#[ ],#L) G 5 so that, as h = 1, property (ii) holds. By (122), 

6*[ ](e) oc #L(e)- If r is an instance of (24), then 0[o](e) oc #? i(e) implies e oc ef,; 
by Definition 6.1 and monotonicity of 7, we have cleanup d (e) oc cleanup d "(£g U e\), 
i.e., ?7 q oc 77^. If r is an instance of (25) or (26), then 6[o](e) oc #[ ]( e ) implies 
v oc vK By Definition 6.1, newd(v) oc newd^f"). By the side condition for abstract 
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rule (98), newd"(u) = ({o~\,l),e\). By the side conditions for (25) and (26), either 
newd(w) = £ oc e\ — and hence cleanup d (£) oc cleanup d "(eQ Ue\ ) by Definition 6.1 — 
or newd(w) = (ci, 1) oc (erf, I). Thus, in both cases, r; q oc 77" 

Local Variable Declaration. The proof for local variable declaration, when r is an 
instance of one of rules (27)-(29), is the same as that for global variable declaration, 
with the few necessary adjustments (i.e., using unmark s , unmark*, new s , new s " and 
i in place of cleanup d , cleanup^, newd, newd" and I). 

Function Declaration. If r is an instance of (30) then, by the hypothesis, 7? q oc 77* . 

Recursive Declaration. If r is an instance of (31), then h = 2 and, by the hy- 
pothesis, (0[o],0[p]) e (1^2), 0[a](e) oc #' ](e), which implies that p denotes 
the same environment in both r and r" and cr oc a\. Hence, (^[l],^]) € S and 
property (ii) holds. By (122), %](e) oc ^(e) which implies n oc rf . Hence, 77 q oc r? q . 

Global Sequential Composition. If r is an instance of one of rules (32)-(34), then 
1 < h < 2 and (e m ,ef 0] ) G 5. By (122), [o] (e) cx 0» o] ( e ). 

If r is an instance of (32), then /i = 1 and property (ii) holds. Also, 0[ O ] (e) oc #j j (e) 
implies e oc e and hence Vq oc ?y q . 

If r is an instance of (33) or (34), then h = 2 and, since uo oc (rj, (#[ij oc 0^) G S 1 , 
so that property (ii) holds. By (122), we have 0[i](e) oc ^(e). If r is an instance 
of (33), then 0[i](e) oc ^(e) implies e oc e l7 so that 77 q oc ?y q . If r is an instance 
of (34), then 6[o](e) oc ^(e) and #[i](e) oc ^(e) imply that a\ oc crj and that the 
two environments po and p\ are the same in both r and r" . Hence, their composition 
Po[pi] is the same in both rules r and r", so that r/ q oc 7/ q . 

Local Sequential Composition. The proof when r is an instance of one of rules 
(35)-(37) is similar to that for global sequential composition. 

7.3 Statements 

For this part of the proof, we use Table III. By (121), N ct NK Thus letting 
N = (s,a) and N$ = (s, <t"), by Definition 6.4, we have the implicit hypothesis 
a oc aK We show using (82) in Definition 6.5, that rj s oc 77! . 

Nop. If r is an instance of (38) then, by the hypothesis, 7y c oc 7y|. 

Assignment. Suppose r is an instance of (39) or (40). Then h = 1 and, by 
the hypothesis, (#[ ] , 6^) G S and hence property (ii) holds. By (122) we have 

0[o](t) oc #f ](e). If r is an instance of (39), e oc e\. Moreover, if r is an instance 
of (40), (sval, (To) oc (sval^o-Q) so that, by Definition 6.1, <7n [p(id) := sval] oc 
ctq [p(id) := sval"]; letting a$[p(id) := sval"] = ((r},e}), this means that cither wc 
have cr [p(id) := sval] G ExceptState, so that cr [p(id) := sval] oc e", or we have 
ctq [p(id) := sval] G Mem, so that a [p(id) := sval] oc a\. In all cases, 7y s oc 77* . 
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Table III. Corresponding concrete and abstract rules and terminals for statements 



s 


r 


r« 


Vs 












<A 




nop 


38 


104 


a 


a* 


none' 


id := e 


39 
40 


105 


e 

(to [p(id) := sval] 


°\ 


e u e l 


so; si 


41 
42 


106 


V 


j 

°i 


e u £j 


d; s 


43 
44 


107 


e 

um s (rf) 


um s "(crj) 


e» Uum s »(e») 


if e then so else si 


45 

46/47 


108 


e 
V 




e\ue\ue\ 


while e do so 


48/50 

49 

51 


109 


e 

°o 
V 




e\ue\ue\ 


throw s 


52 
53 
54 


110 
111 


W,x) 

e 

{(T , sval) 


_L 


e «u e » 


try s catch k 


55 
56 


112 


°o 
V 




e »u e » 


try so Anally si 


57 
58 
59 


113 


V 

(o"i.5o> 

e 




e »U e »LJ(o$®£») 


id := id (ei, . . . ,e„) 


62 
63 
64 


114 


e 

um s (ul s (e)) 
um B (r? 2 ) 


um s "(<r£) 


£«=£» 

Uum s »(ul s «(4)) 
Uumsfe') 



Statement Sequence. Suppose r is an instance of (41) or (42). Then 1 < h < 2 
and, by the hypothesis, (#[o]>#[o]) e By (122), 6[o](e) oc ^(e). If r is an instance 
of rule (41), as h = 1, property (ii) holds and also e oc £ . If r is an instance of (42), 
then (To oc °o so tnat (%]'^m) e ^' a ^ so ' as /i = 2, property (ii) holds; by (122), 
%](e) oc ^(e) so that rj oc («rj,e'). Hence, in both cases, r/ s oc 

Block. Suppose r is an instance of (43) or (44). Then 1 < h < 2 and, by the 
hypothesis and Definition 6.1, (^[o],^f ]) e By (122), 0[o](e) oc 0^(e). If r is an 
instance of (43), as h = 1, property (ii) holds and also e oc e . If r is an instance 
of (44), then er oc <7q so that (^[i],^m) G S; also, as ft, = 2, property (ii) holds; 
by (122), 0[i](e) oc ^(e); so that w oc (crj,ej) and therefore, by Definition 6.1, 
unmark s (?7) oc (unmarkf (ctJ), unmarkg(e})). Hence, in both cases ?7 S oc r)\. 

Conditional. Suppose r is an instance of one of rules (45)-(47). Then 1 < h < 2 
and, by the hypothesis, (0 [O ],0[ O] ) £ S. By (122), 9 [0] (e) oc 0» o] (e). 

If r is an instance of (45), h = l, property (ii) holds and, as e oc e , ?7 S oc r/|. 

If r is an instance of (46) or (47), then h = 2 and <t k °o- By tne s ^ e 
conditions and Definition 6.2, if tt oc t", then (tt, oo) oc (i", <r t t) and, if ff oc then 
(ff, (To) oc (t",(7ff). Hence, if (46) applies, 0[i](e) oc #m(e) so that 77 oc (cr^e}); and, 
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if (47) applies, %](e) oc 0jL(e) so that n oc (a\,e\). Hence, in both cases, % oc ryf. 

While. Suppose r is an instance of one of rules (48)-(51). Then 1 < h < 3 and, 
by hypothesis, (0 [o] ,0[ o] ) G 5. By (122), [o] (e) cx 0[ Q] (e). 

If r is an instance of (48), h = 1, property (ii) holds and, as e oc £ , ?7 S oc 

Suppose r is an instance of (49), (50) or (51). By the side conditions and Defi- 
nition 6.2, if tt oc t B , then (tt, a Q ) oc a u ) and, if ff oc 0, then (ff, a ) oc (t", <r ff ). 

If r is an instance of (49), then, as h = 1, property (ii) holds and hence oc r?|. 

If r is an instance of (50), then h = 2. Thus (^[l],^]) £ S* and property (ii) 
holds. By (122), #[i](e) oc ^(e) so that e oc e\. Hence ri s oc 7?|. 

If r is an instance of (51), then h = 3. Thus (^i],^) G 5". By (122), %](e) oc 
eL(e) so that cti oc a\. Thus (0[2],0? 2] ) G S and property (ii) holds. By (122), 
6*[ 2 ](e) oc ^p]( e ) so * nat ^ ^ ( ct 2j e 2)- Hence ?7 S oc ry|. 

Throw. Suppose r is an instance of (52). Then s = x G RTSExccpt (so that 
rule (111) is not applicable). By definition of a: p(RTSExccpt) >— » RTSExcept*, 
X oc a({x}). Since, by hypothesis, ct oc ct b , ct" <8> a({x}) = (o~K a({x})) so that, by 
the side condition for (110), (<r, x) oc e". Hence 7/ s oc ry|. 

Suppose r is an instance of (53) or (54). Then s = e 6 Exp (so that rule (110) 
is not applicable). By hypothesis, (0[o],0jp]) e ^ anc ^' as ^ — 1; property (ii) holds. 
By (122), 6[o](e) oc #' ](e). If r is an instance of (53), then e oc e', while, if r is an 
instance of (54), sval oc sval" and <r oc u\. Hence, in both cases, ?7 S oc r?|. 

Try Blocks. Suppose r is an instance of (55)-(59). By hypothesis, (8^ , flj^) e S. 
By (122), 0[ O ] (e) oc #L(e). Note that if r is an instance of (55) or (56), only abstract 
rule (112) will be applicable while if r is an instance of (57)-(59), only abstract 
rule (113) will be applicable. 

If r is an instance of (55), h = 1, property (ii) holds and, as un oc <Tq, r] s oc ry|. 

If r is an instance of (56), then e oc £ so that (#[1],$^]) G S. Thus, as h = 2, 
property (ii) holds. By (122), 0[i](e) oc ^(e) so that (u,r]) oc ((o"},e}),e|) where 
u G {caught, un caught}. By Definition 6.5, if u = caught, then r\ oc {a\,e\) and, 
if u = uncaught, then 77 oc £ 2 . Hence, in both cases, % oc 

If r is an instance of rule (57), en oc cr'; hence (0[i]>0m) G 5 and property (ii) 

holds. By (122), 0[i](e) oc (^(e) so that r\ oc {o\,e\). Hence ?7 S oc r/|. 

If r is an instance of (58) or (59), (<7n, Co) oc (ct}, hence cto oc crj and £ oc so 
that (0[i],0[ 2] ) G 5 and property (ii) holds. By (122), 6» [:L] (e) oc 0[ 2] (e). Thus, if (58) 

applies, o\ oc (a', £3) so that (<7i,£o) k (03 ®£i)> anci , if (59) applies, e oc (cr^Eg) 
so that e oc £3. Hence, in both cases, ?7 S oc 77J. 

Function call. If r is an instance of one of rules (62)-(64), then 1 < h < 3 and 
£ = 3. Then the conditions (60) and (61) are also conditions for abstract rule (114). 
By hypothesis and Definition 6.1, (0[ O ],dL) G S; by (122), 6 [0] (e) oc 0? o] (e). 
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Tabic IV. Corresponding concrete and abstract rules and terminals for function bodies 



B 


r 


r* 


Vb 


■ <=. 


((svalLcri),ei) 


(sval^,(7*) 




let d in s result e 


65 


115 


e 


um s »( CT ») 


tt tt 




66 




um s (e) 




Uum s »( E { U<4) 




67 




™ s (i)o) 






extern : sT 


68 


116 


CO | <o- ,0 







If r is an instance of (62), then e oc e\, h = 1 and property (ii) holds. Hence 
7] s oc ??|. 

If r is an instance of (63), then oq oc (Tq so that, by Definition 6.1, {0[i\, 0^) G S; 
also, as ft = 2, property (ii) holds. By (122), #[i](e) oc 0^(e) and e oc e}; by 
Definition 6.1, unmark s (unlink s (e)) oc unmark^unlink^ej)). Hence ?7 S oc 

If r is an instance of (64), then o\ oc cr} so that, by Definition 6.1, (#[2]>0[ 2 ]) e ^' 
also, as ft = 3, property (ii) holds. By (122), 0[ 2 ](e) oc #f 2 i(e) and 772 oc (c 2 ) e 2)i by 
Definition 6.1, unmark s (772) oc (unmark* (cr 2 ), unmarkf (e 2 )). Hence ?7 S oc r)\. 

7.4 Function Bodies 

For this part of the proof, we use Table IV. By (121), N oc NK Thus letting 
N = (B,a) and iV" = (B,cr tt ), by Definition 6.4, we have the implicit hypothesis 
er oc u". We show using (82) in Definition 6.5, that 77b oc r]^. 

Suppose r is an instance of one of rules (65)-(67). By hypothesis and Defini- 
tion 6.1, mark s (er) oc markka"), so that (0[ O ],0[ O] ) G S. By (122), [o] (e) oc 0[ o] (e). 

If r is an instance of (65), e oc e\, ft = 1 and property (ii) holds. Hence rjb oc r/'. 

If r is an instance of (66), <r oc a\; hence (0[i],0ji]) G 5 and, as ft = 2, prop- 
erty (ii) holds. By (122), 0[i](e) oc ffL(e) so that e oc ej. By Definition 6.1, 
unmark s (e) oc unmarkf (e\); hence 77b oc 77^. 

If r is an instance of (67), (To K Co! hence (0[i])0m) G S. By (122) we have 
%](e) oc ^^(e), so that <n oc erj; hence (0[2]>0p]) ^ S; a,s h — 3, property (ii) 
holds. Again, by (122), 0p](e) oc 0j 2 ](e); hence, 770 oc (o\,e\). By Definition 6.1, 

unmark s (?7o) oc (unmar kjj (cr 2 ), unmarkf (e 2 )); hence ?7b oc 77^. 

Suppose r is an instance of (68). Then a = (p,w) and <r — (^o, w )- By the 
hypothesis, <r oc cr'; hence, by the side conditions, a n oc Cq; also, £ oc T, so that 
Vb oc ?7b- 

7.5 Catch Clauses 

For this part of the proof, we use Table V. By (121), N oc NK Thus, letting 
N = (K, e) and AT" = (if, e"), by Definition 6.4, we have the implicit hypothesis 
eocel We show using (83) in Definition 6.5, that 7?k oc 77'. 

Catch. Let K have the form (j>) s for some exception declaration p. 
Suppose r is an instance of one of rules (69)-(71). Then, by the hypothesis and 
Definition 6.3, e oc 4> + (p, £*); by the side conditions for the abstract rules, e oc ejj. 
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Table V. Corresponding concrete and abstract rules and terminals for catch clauses 



K 


r 


r« 


Vk 


4 = (4,4} 










'/a 


£a 


(any) s | (*) s | (sT) s 


69 


117 


(caught, f) ) 




4 


(id : sT) s 


70 
71 


118 


(caught, um s (e )) 
(caught, t? ) 


(0-4, e 4 ) = (um s "(0-3), 

um s tt(4)Uum s tt(4)) 


4 

fc i 


(x) s | (sT) s 


72 


117 


(uncaught, (cr, £) > 


77$ 


4 

fc i 


(id : sT) s 


72 


118 


(uncaught, (cr, £)) 


(4.4 u 4) 


4 


k ; fci 


73 
74 


119 


(caught, r; ) 
»7 


(a»Ua»,4u4) 


4 

fc 3 



If r is an instance of (69) then e = (cr, £); by Definition 6.1, cr oc mem(e5); Hence 
(^[o]:^[ ]) e S an( i, as ft = 1, property (ii) holds. By (122), #[o](e) oc 0jpj(e), which 

implies 770 oc 77} so that 7/k oc r)\,. 

If r is an instance of (70) or (71), then e = (cr, sval) and type(sval) = sT; by 
Definition 6.1, cr oc mem^) and sval oc sT(cq). Hence, by Definition 6.1, 

new s (sval, mark s (ct)) oc new4 ^sT(eJ), mar k»(mem(4)))=((a»4),4). (123) 

If (70) applies, then ft = 0, so that property (ii) holds trivially, and, by the side 
condition, en = new s (sval, mark s (cr)) so that by (123), £n oc e\\ by Definition 6.1, 
unmark s (eo) oc unmarkj!(£ 2 ). If (71) applies, then, by the side condition, (<rn,i) = 
new s (sval, mark s (<t)) so that by (123), do oc g\. Hence, (#[o],#? n i) G S and, as ft = 1, 
property (ii) holds. By (122), #[o](e) oc 0jpj(e), which implies 770 oc {a\,e\). Thus, by 
Definition 6.1, unmark s (?7o) oc (unmark^Og), unmark^e')) ■ Hence, in both cases, 
Vk oc rfc. 

If r is an instance of (72), then ft = 0, so that property (ii) holds trivially. We 
have e = (a, £) and, by the side condition, p £ {£,cT, any}, where cT = type(£). 
If p G {x,sT} then abstract rule (117) applies so that, by the hypothesis, the side 
conditions and Definition 6.3, (cr, £) cc 4>~{p, 4) = e\. Similarly, if p = id : sT and 
abstract rule (118) applies, (cr, £) oc c/>~(sT,4) = e\. Hence, in both cases, ?7k oc rfc. 

Catch Sequence. If r is an instance of (73), then as ft = 1 and (#[n],#? n i) 6 5, 
property (ii) holds. By (122), 0[ O ](e) oc 0[ O] (e), so that (caught, 770} oc ({<j\, e\), e\). 
By (83) in Definition 6.5, rj oc (ctq,£q), which implies ?7k oc 77'. 

If r is an instance of (74), then (0[o]i0jp]) e & an d, by (122), 0[o](e) oc 0* O ](e)- 
Thus, (uncaught, £ ) oc ((ctq, £q), £ i)> so that, by (83) in Definition 6.5, £0 oc e\. 
Hence (6>[i] , 0^ ) G 5 and, as ft = 2, property (ii) holds. By (122), %](e) oc ^(e), 
so that 77 oc ((crj, £2), £3}, which implies ?7k oc 77^. □ 

A few observations regarding the precision of the proposed approximations are in 
order. Consider an abstract tree 0" G 6" such that 0*(e) = (p hp N$ — ► 77'), where 
AT" G rf and 77" G Tj. If the concretization functions relating the concrete and 
abstract domains are strict, then the abstract tree above will encode the following 
definite information: 
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— non-terminating computations (i.e., unreachable code), if 77" = _L; 
— non-exceptional computations, if 77" = (cr", none*) and a" ^ _L; 
— exceptional computations, if 77" = (J-,£*) and e" ^ none*. 

Obviously, a precise propagation of this definite information requires that all of the 
abstract domain operators are strict too. Hence, if 0"(e) = (p \~p (s, _L) — ► 77"), 
we will also have 77" = _L. Similar properties hold when considering expressions, 
declarations and catch clauses. 

8. COMPUTING ABSTRACT TREES 

The results of the previous section (Theorems 7.2 and 7.3) guarantee that each 
concrete tree can be safely approximated by an abstract tree, provided the non- 
terminal configurations in the roots satisfy the approximation relation. 

For expository purposes, suppose we are interested in a whole-program analysis. 
For each (concrete and abstract) pair of initial memories satisfying o~\ oc a\ and each 
.9o = (<?; gvar x : integer = 0), where g is a valid program, we obtain that any ab- 
stract tree 0\ £ 0" such that 0\(e) = (0 h (go, erf) — > 77') correctly approximates 
each concrete tree 80 £ O such that 60(e) = (0 h (go, cr;) — > 770). Notice that 9\ is 
a finite tree. Letting rf = ((po, o"q),£ ) an d assuming 770 ^ ExceptState, we obtain 
Vo = (Po,&o), where ao oc cTq . Hence, letting s = (x := main(n)) and p : /?, any 
abstract tree 6>{ £ 0" such that 0j(e) = (po l~/3 ( s Oi (7 o) ~~ * *7i) correctly approx- 
imates each concrete tree 9\ £ such that either #i(e) = (po l~/3 (so,&o) —* Vi) 
or 9i(e) = (po l~/3 (so,o"o) -—->)• We are thus left with the problem of computing 
(any) one of these abstract trees, which are usually infinite. In particular, we are 
interested in choosing 6\ in a subclass of trees admitting finite representations and, 
within this class, in maintaining a level of accuracy that is compatible with the 
complexity /precision trade-off dictated by the application. 

A classical choice is to restrict attention to rational trees, that is, trees with only 
finitely many subtrees: the algorithm sketched in [Sch95; Sch97; Sch98], which 
assumes that the abstract domain is Noetherian (i.e., all of its ascending chains are 
finite) , guides the analysis toward the computation of a rational tree by forcing each 
infinite path to contain a repetition node. Here below we describe a variation, also 
working for abstract domains that admit infinite ascending chains, that exploits 
widening operators [CC76; CC77a; CC92b]. 

Definition 8.1. (Widening operators.) Let (£)", C, _L, U) be an abstract do- 
main. The partial operator V : £)" x I?" >— » D$ is a widening if: 

— for all x", y* £ D$, y* C x" implies that y" V x" is defined and x" C y" V x"; 

— /or aiZ increasing chains x C xj C • • • , ifce increasing chain defined by yjj "= x 
and y| +1 = f y- V (yjj U x- +1 ), /or i £ N 7 is not strictly increasing. 

The algorithm works by recursively constructing a finite approximation for the 
abstract subtree rooted in the current node (initially, the root of the whole tree). Let 
n = (p\-p (q, yjj — ► r„) be the current node, where q is a uniquely labeled program 
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phrase, 10 j/" G -D" is either an abstract memory a" G Mem" or an abstract exception 
state e" G ExceptState", and r„ is a placeholder for the "yet to be computed" 
conclusion. The node n is processed according to the following alternatives. 

(i) If no ancestor of n is labeled by the program phrase q, the node has to be 
expanded using an applicable abstract rule instance. Namely, descendants of 
the premises of the rule are (recursively) processed, one at a time and from 
left to right. When the expansion of all the premises has been completed, 
including the case when the rule has no premise at all, the marker r n is 
replaced by an abstract value computed according to the conclusion of the 
rule. 

(ii) If there exists an ancestor node m = (p\~p (q,Um) ~~ * r m) of n labeled by the 
same program phrase q and such that y\ C yf^, i.e., if node n is subsumed 
by node m, then the node is not expanded further and the placeholder r„ is 
replaced by the least fixpoint of the equation r„ = / m (r n ), where f m is the 
expression corresponding to the conclusion of the abstract rule that was used 
for the expansion of node m. 11 Intuitively, an infinite subtree rooted in node 
m has been identified and the "repetition node" n is transformed to a back 
edge to the root m of this subtree. 

(iii) Otherwise, there must be an ancestor node m = (p \-p (q 7 ym) — > r m ) of n 
labeled by the same program phrase q, but the subsumption condition y\ C 
does not hold. Then, to ensure convergence, the abstract element yjj, in node 
n is further approximated by y^ m V (yj^ U yjj and we proceed as in case (i). 

Termination of the algorithm can be proved thanks to the following observations: 
an infinite abstract tree necessarily has infinite paths (since the tree is finitely 
branching); each infinite path necessarily has an infinite number of nodes labeled by 
the same program phrase (since the set of program phrases is finite); the application 
of case (iii) leads to the computation, along each infinite path, of increasing chains 
of abstract elements and, by Definition 8.1, these chains are necessarily finite; hence, 
case (ii) is eventually applied to all infinite paths, leading to a finite representation 
of the rational tree where all the infinite paths are expressed by using back edges. 

It should be stressed that, as far as efficiency is concerned, the algorithm outlined 
above can be improved by the adoption of well studied memoization techniques; 
as noted in [Sch97], by clearly separating design concerns from implementation 
concerns, the adopted methodology produces simpler proofs of correctness. Also 
note that the choice of the widening operator has a deep impact on the precision of 
the results obtained and, moreover, even a precise widening can lead to inaccurate 
results if applied too eagerly. However, precision problems can be mitigated by the 
application of suitable "widening delay" techniques [CC92b; HPR97; BHRZ05]. 



10 Unique labels (e.g., given by the address of the root node for q in the program parse tree) ensure 
that different occurrences of the same syntax are not confused [Sch95]; this also means that, in 
each node n, the type and execution environments p and (i arc uniquely determined by q. 

As explained in [Sch95; Sch97; Sch98], the computation of such a least fixpoint (in the context 
of a coinductive interpretation of the abstract rules) is justified by the fact that here we only 
need to approximate the conclusions produced by the terminating concrete computations, i.e., by 
the concrete rules that arc interpreted inductively. Also note that the divergence rules have no 
conclusion at all. 
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9. EXTENSIONS 

In this section we outline how the techniques presented in the first part of the 
paper can be extended so as to encompass the C language and all the imperative 
aspects of C++ (including, of course, exceptions): Section 9.1 shows how the set of 
primitive types can be extended by discussing the introduction of bounded integer 
and floating-point types; Section 9.2 provides a sketch of how C-like pointers, arrays 
and records can be dealt with; dynamic memory allocation and deallocation is 
treated in Section 9.3; and Section 9.4 illustrates how all the non-structured control 
flow mechanisms of C and C++ can be accounted for. 

Once an ABI (Application Binary Interface) has been fixed and its characteristics 
have been reflected into concrete and abstract memory structures, C struct and 
union compound types can be accommodated, even in presence of pointer casts 
and unrestricted pointer arithmetics, by compiling down all their uses to memory 
reads and writes performed through pointer dereferencing [Min06] . 

While we have not yet tried to incorporate object-oriented features (like classes, 
inheritance, method calls with dynamic binding and so forth) we do not see what, 
in the current design, would prevent such an extension. 

9.1 Additional Arithmetic Types 

The addition of more arithmetic types such as (signed and unsigned) finite integer 
and floating-point types is fairly straightforward. It is assumed that a preprocessor 
will add, as needed, a value cast operator that, for a given numeric type and constant 
expression, ensures that either the returned value is in the domain of that type or 
an appropriate exception is thrown. With this assumption, all the operations need 
only to be specified for operands of the very same type. 

9.1.1 Syntax. For floating-point numbers, we add a new basic type float that 
represents a fixed and finite subset of the reals together with a set of special values 
denoting infinities, NaN (Not a Number) value and so forth. The exact format and 
range of a floating-point literal is unspecified. The addition of other floating-point 
types to represent double and extended precision numbers can be done the same 
way. To exemplify the inclusion of signed and unsigned bounded integer types, we 
also add the signed_char and unsigned_char basic types. 

Integer types. iT G iType d = {integer, signed_char, unsigned_char, . . .}; 

Numeric types. nT G nType = iType U {float, . . .}; 

Basic types. T G Type = f nType U {boolean}; 
Floating-point literals, fl G Float; 
Signed char literals, sc G sChar; 
Unsigned char literals, uc G uChar. 

Expressions and constants. Expressions are extended with floating-point constants, 
bounded integer constants, and vcast, a value cast operator for converting values 
from one basic type to another, when possible, or yielding an appropriate exception: 

Exp 3 e ::= . . . | fl | sc | uc vcast(nT, e) 
Con 3 con ::=... | fl | sc | uc. 
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The functions dom: cTypc — > {Integer, Bool, RTSExcept, Float, sChar, uChar} 
and type : sVal >— > sType are easily extended: 

dom(float) d = Float, type(fl) = float, 

dom(signed_char) = sChar, type(sc) = signed.char, 

dom(unsigned_char) = f uChar, type(uc) = f unsigned.char. 

9.1.2 Static Semantics. The required adjustments to functions FI and DI are 
straightforward and thus omitted. Then, we add the following static semantic rules, 
where @ e {+, -, *, /, %} and ffl G {=, ^, <, <, >, >}: 



Expressions. 



P hj fl : float (3 \~i sc : signed_char 

hj e : nT 

/3 hj — e : nT /3 l~/ uc : unsigned.char 

/3 hj eo : nT /3 hj ei : nT /3 h/ eo : nT /3 h/ ei : nT 

/3 hj e @ ei : nT [3 \-j e EE e\ : boolean 

P \-i e : T 

if casting To to T\ is legal. 

p hj vcast(Ti,e) : 7\ 

9.1.3 Concrete Dynamic Semantics. The added numeric types and the oper- 
ations upon them bring in a considerable degree of complexity. Consider the C 
language, for example: unsigned bounded integers employ modular arithmetic; for 
signed bounded integers, overflow yields undefined behavior; the results of floating- 
point operations depend on the rounding mode in effect and on the settings that 
cause floating-point exceptions to be trapped or ignored; relational operators may 
or may not raise a floating-point exception when one or both arguments are NaN. 
In order to factor out these details and delegate them to the memory structure, we 
resort to a device like the one used to model supported and unsupported language 
elements in the abstract semantics. We thus postulate the existence of the partial 
functions 



eval vc 
eval_j 
eval|o] 
evalm 



(nType x Con x Mem) >— » ValState W ExceptState, 
(Con x Mem) >-» ValState ttl ExceptState, 
(Con x Con x Mem) >-> ValState ttl ExceptState, 
(Con x Con x Mem) >-> ValState ttl ExceptState, 



that model the cast operator, unary minus, binary operators @ e {+,—,*,/,%} 
and relational operators ffl € {=, 7^, <, <, >, >}, respectively. Such functions need 
not be always defined: for example, there is no need to define eval+ (conn , coni , a) 
for the case type(con ) ^ type(coni). 
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Value casts. The following concrete rule schemata use the corresponding evalua- 
tion function to specify the execution of the vcast operator. 

P 1-/3 (e, a) -> e p\~p (e, a) -> (con, cr ) 

ph(3 (vcast (nT, e), cr) — > £ p h/3 (vcast (nT, e), cr) — > eval vc (nT, con, crn) 

Arithmetic evaluation. By using the evaluation functions, we can substitute rules 
(5), (8) and (9) with the following (note that they also capture the case when a 
divide- by-zero exception is thrown): 

P !~/3 (e, cr) -> (con, cr ) 



(-e,cr) 
P 1-/3 (eo,CT> 



> eval-j (nT, con, cr ) 
(con ,cr ) p (ei,cr ) 



(con^CTi) 



p 1-/3 (en @ ei, cr) — > eval @ (con , coni,<ri) 
Arithmetic tests. Similarly, rule (12) is replaced by the more general rule 
P r-0 (e , cr) -> (con , cr ) p (ei, cr ) -> (coni, <Ti) 
p h/3 (e ffl ei, cr) — > eval ffl (con , coni,<ri) 

9.2 C-like Pointers, Arrays and Records 

9.2.1 Syntax. Recall that in Sections 3 and 4 we defined the set of storable types, 
whose values can be read from and written to memory, and the set of denotable 
types, that can occur in declarations. The introduction of pointer, array and record 
types requires the adoption of a finer classification. The set of all memory types is 
partitioned into object types and function types: the latter differ in that wc cannot 
read or update the "value" of a function; rather, we execute it. Object types 
are further partitioned into elementary types (also called scalar types, including 
basic types and pointer types) and aggregate types (arrays and records). All the 
elementary types are storable, meaning that their values can be read directly from 
or written directly to memory, as well as passed to and returned from functions. 
Regarding aggregate types, the C language prescribes that record types are storable, 
whereas array types are not. Pointer, array and record type derivations can be 
applied repeatedly to obtain, e.g., multi-dimensional arrays. 



Types. 

eType 3 cT 
pTypc 3 P T 
sType 3 sT 
aType 3 aT 
rTypc 3 rT 



= T P T 

= mT* 

= eT I rT 

= array m of oT 

= record id of idi 



0T1 



, idj 



oType 3 oT 
fType 3 fT 
mTypc 3 mT 
dTypc 3 dT 



:= sT I aT 
:= fps -> sT 
:= oT I fT 
:= mT loc 



We assume, without loss of generality, that the field names of record types are 
unique across the entire program (for example, idi, . . . , idj could contain id as 
some kind of special prefix). 

Identifiers are no longer the only way to denote a memory structure location. 
This can also be referred to by combining a pointer with the indirection operator 
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'*', an array with the indexing operator, or a record with the field selection operator. 
Hence, we introduce the concept of lvalue, which can be read as "location-valued 
expression." 

Offsets and lvalues. 

Offset 3 o ::= □ | [e] • o | .id • o 
LValuc 3 lval ::= id • o \ (* e) ■ o 

Consequently, the syntactic production for expressions generating identifiers, as 
well as the productions for statements generating assignments and function calls, 
are replaced by more general versions using lvalues; expressions and declarations 
are also extended with the address-of operator, null pointers and array variables. 

Expressions, declarations and statements. 

Exp Be ::=... | vallval | & lval | (pT) Glob 3 g ::=... \ gvar id : aT = e 
Stmt 3 s ::=... lval := e | lval := e(es) Decl 3 d ::= . . . | lvarid : aT = e 

9.2.2 Static Semantics. The required adjustments to functions FI and DI are 
straightforward and thus omitted. The well-typedness of offsets and lvalues is 
encoded by the following predicates: 

/3,dTo bj o : dTi, o is compatible with dTo and has type dTi in j3; 

[3 bj lval : dT, lval is well-formed and has type dT in (3. 

The static semantics is thus extended by the following rules. 12 Note that the eval- 
uation of an lvalue as an expression — vallval — causes a suitable type conversion, 
sometimes referred to as "type decay." Pointer arithmetics can only be applied 
to object types. In function calls, the callee is specified via an expression having 
function pointer type (typically resulting from a type decay). 

Offset. 

(3 bj e : integer (3, oT loc bj o : dT 



if ie {!,... ,j} 



(3, dT b 7 □ : dT f3, (array m of oT) loc bj [e] • o : dT 

f3,oT l loc bj o : dT 
(3, (record id of id x : oTi ; . . . ; idj : oTj) loc b/ . idj • o : dT 
Lvalue. 

13, dT b/ o : dTi (3 b 7 e : mT * (3, mT loc b/ o : dT 

if /5(id) = dT 

/Sb/id-oidTi /3b/ (*e)-o:dT 

Null pointer and address-of operator. 

/3 b/ lval : mT loc 



(3 bj (pT) : pT /? b/ & lval : mT* 



2 The previous rules for identifier, assignment and function call are no longer used. 
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Type decay. 

13 hj lval : sT loc h 7 lval : (array m of oT) loc (3 hj lval : FT loc 

f3 hj val lval : sT (3 h j val lval : oT* /? hj val lval : FT* 

Pointer arithmetics. 
(3 hj eo : oT * /? hj ei : integer /3 hj eo : integer /? hj ei : oT* 

/3 hj e + ei : oT* /3 hj e + ei : oT* 

/? hj e : oT * /? hj ei : integer /3 h/ e : oT * /? h/ ei : oT* 
(3 \~i eo — ei : oT* /? hj eo — ei : integer 

Pointer comparison. 
/3hje :pT /3h /ei :pT 

where IB G {=, ^, <,<,>, >}. 

(3 hj e ffl ei : boolean 

Assignment and function call. 

(3 hj lval : sT loc /3 h/ e : sT 

/3 h/ lval := e 

/? hj lval : sT loc /? hj e : (fps -» sT) * (3, fps hj es 

(3 \~i lval := e(es) 

(Multi- dimensional) Global array declaration. 

(3 hj gvarid : oT = e : {id oT loc} 

if to > 

(3 hj gvarid : array m of oT = e : {id i— ► (array m of oT) loc} 

The static semantics rule for a local array declaration is similar. 

9.2.3 Concrete Dynamic Semantics. Concrete execution environments now map 
function identifiers to (properly typed) locations, rather than function abstracts: 

hence, we redefine dVal = Addr x mType. 

A proper handling of aggregate and function types in memory structures requires 
a few semantic adjustments and extensions. New memory functions allow the allo- 
cation of function abstracts in the text segment, as well as the contiguous allocation 
of a number of memory cells, so as to model (multi-dimensional) arrays: 

new t : (Abstract x Mem) — ► ((Mem x Loc) W ExceptState) , 
newarray d : (Integer x ValState) — ► ((Mem x Loc) W ExceptState), 
newarray s : (Integer x ValState) — > ((Mem x Ind) W ExceptState) . 

It can be observed that the properties stated in Definition 5.2 still hold as long as 
we consider locations having non-aggregate type and properly extend the domain 
and codomain of the absolute memory map: 

Map d = (Loc x (cTypc l+l fType)) >-» (Con W Loc tbl Abstract). 

These "elementary" memory maps need to be extended to read or update record 
values. To this end, we assume the existence of a couple of helper functions working 



On the Design of Generic Static Analyzers for Modern Imperative Languages • 61 



on locations having aggregate type: 

locfield: (Id x Loc x rTypc) >— » (Loc x oType), 

locindcx: (Integer x Loc x aTypc) >— > (Loc x oType). 

Intuitively, when defined, these functions map a record (resp., array) typed location 
to the typed location of one of its record fields (resp., array elements). Hence, for 
each /j, e Map, the extension /u: (Loc x sType) >— » sVal can be recursively obtained, 
for each I G Loc and rT = record id of idi : 0T1 ; . . . ; idj : oT^ , as follows and under 
the following conditions: 

n(l,rT) d = ^(locfield(idi,/,rT)),...,/i(locrield(id j ,/,rT))J), 
where, for each I e Loc and aT = array in of oT € aType, 

fi(l, aT) = /x(locindex(0, I, aT)) , . . . , /i(locindex(m — 1,1, aT)) 

A similar extension is required for the memory update operator. Note that we will 
still use v as a syntactic meta-variable for ValState = sVal x Mem, but now its first 
component can be either a constant, or an absolute location, or a record value. 

Pointer and array indexing errors are modeled via RTS exceptions. It is as- 
sumed there exists a special location l nu u e Loc (the null pointer value) such that 
(Z nu ii,rnT) ^ dom(er) for all a £ Mem and mT G mType; this also implies that 
^nuii cannot be returned by the memory allocation operators. Hence, any attempt 
to read from or write to memory through this location will result in an exception 
state. Suitable operators on memory structures are required to check the constraints 
regarding pointer arithmetics (e.g., out-of-bounds array accesses), pointer compar- 
isons (where EH ranges over {=, 7^, <,<,>, >}) and to perform "array-to-pointer 
decay" conversions or record field selections: 

ptrmove: (Integer x Loc x Mem) — > ValState W ExceptState, 

ptrdiff : (Loc x Loc x Mem) — > ValState l±l ExceptState, 
ptrcmp ffl : (Loc x Loc x Mem) — > ValState 1+1 ExceptState, 
firstof : (Loc x Mem) — > ValState l±J ExceptState, 
field : (Id x Loc x Mem) — ► ValState t±J ExceptState. 

Note that array indexing is semantically equivalent to a suitable combination of 
type decay, pointer arithmetics and pointer indirection. Nonetheless, for the sake 
of clarity and also to simplify the application of pointer and array dependence 
analyses [EGH94], we keep the distinction of the two constructs and, to simplify 
notation, we define 13 

index : (Loc x ValState) >— > ValState 1+) ExceptState 

as follows: 

. , /, , ,n def J £, if firstof (I, a) = e; 

mdexu, (m, a)) = < 

I ptrmovc(m, Iq, ctq), if firstof (I, a) = (l , ctq). 



13 Functions 'field' and 'index' are similar to 'locfield' and 'locindex', but they are also meant to 
check their arguments against the memory structure, possibly returning an RTS exception. 
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Non-terminal and terminal configurations are extended so as to allow for the 
syntactic categories of offsets and lvalues, whose non-exceptional evaluation leads 
to a location: 



ft d = { (o,l,a)G Offset x Loc x Mem 



3dT ,dTi e dType 
0,dT o hjoidTi 



rf d = { (lval, a) e LValuc x Mem | 3dT G dType . hj lval : dT }, 
T =' T] d = (Loc x Mem) ttl ExceptState, 
The dynamic concrete semantics is extended with the following rule schemata. 
Offset. 



ph p (D,l,a) -> (I, a) 

p h^ (e, <r) -> £ p\- (e,a)^v 



p hp ([e] • o,l,a) -> e p\-f3 ([e] • o,l,a) -> e 



if index(Z, u) = e 



p h/3 (e, cr) -» v p\- (o,l Q ,a ) -^r) 

■ — — if mdex(Z, v) = (Z , er ) 

P h /3 \ l e \ -0,1,(7) ->V 

if field (idj, I, a) = e 



p \-p (.id 4 • o,l,a) -> e 

P I - /? (o,Io,(Jq) -> 7? 
p h/3 (.id 4 -o,l,a) ->i] 

Lvalue. 
p\-p {o,a@a,cr) -> 77 



if field (id,-, Z,cr) = (Z ,cr ) 



if p(id) = (a, mT) 



p h^ (id • o, cr) -> 77 

P 1-/3 (e, cr) -> £ p h/j (e,cr) ^ (/ ,cr ) p h/3 (o, Z ,cto) -> »7 



p h/3 <(*e) • o,cr) -> e p h-,3 <(*e) • o, cr) -> 77 

ATuZZ pointer and address-of operator. 

p h/3 (lval, cr) — > 77 



P h /3 ((P T ) 0) CT ) -> ('null, cr) p h/3 (& lval, Cr) -> 7? 

Type decay. 

p h/3 (lval, cr) — > e 
p h/3 (vallval, cr) — > e 
p h/3 (lval, cr) -> (Z, o-q) 



p h/3 (vallval, cr) — > cr [Z, sT] 

p h/3 (lval, cr) — > u 
p h/3 (vallval, cr) — » firstof(u) 



if h FI ( lval ) lval : sT loc 
if h FI ( lva i) lval : aT loc 
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p \~fj (lval, a) — > v 



p\-p (vallval, a) 



if P l-Fi(ivai) l va l : fT loc 



Pointer arithmetics. Let @ denote a binary abstract syntax operator in {+,—}, 
as well as the corresponding unary operation on integers. Then, the following are 
added to rule schemata (6)-(9). 

P r-/9 (e ,cr) -> (I, (T ) p hp (ei,cr ) -» £ 



p 1-/3 (e ei,<r) -» e 
P r-/9 (e , (t) -> (I, cr ) P 1-/3 (ei, cr ) -> (to, CTi) 



P h/3 (e @ ei, 


a}- 


• ptrmove(TOo, Z, o~i) 


P 1-/9 (e ,cr) -> (m 


,co) 


p \- (ei,<j ) -> (l,(Ti) 


P h/9 (eo + ei, 


^>- 


■> ptrmove(m, I, ai) 


P h/9 (eo,cr} -> (Zrj ; 




P 1-/9 (ei,o- ) -> <Zl , CTi > 



if too = to 



p r-/9 (e - ei, cr) -> ptrdiff (Z , Zi, cri) 

Pointer comparison. Let 111 denote a binary abstract syntax operator in the set 
{=, 7^, <, <, >, >}. Then, the following are added to rule schemata (10)— (12). 

P r"/9 (e ,cr) -> (Z,<7 ) P r-/3 (ei,cr ) -> £ 



P 1-/3 (eo ffl ei,cr) -> e 
p h/9 (e , cr) — > (lo, (To) P r-/9 (ei, (To) — > (Zl , CTi ) 



P 1-/3 (e ffl ei,cr) -> ptrcmp H (Z ,Zi,o'i) 
ylssignmenf. 

p h/9 (lval, cr) -> e p h/9 (lval, cr) -> (I, cr ) p h/9 (e, cr ) -> e 

p h/9 (lval := e, cr) — > £ p h/9 (lval := e, cr) — > £ 

p h/9 (lval, cr) — > (/, cr ) P 1^/3 (e, cr ) — > (sval, (Ti) 



p h/9 (lval := e, cr) — > ai [(Z, sT) := sval] 



if /? h FI(e) e : sT 



Similar changes arc required for the case of a function call. First, the lvalue is 
evaluated so as to obtain the target location where the result of the function call 
will be stored; then, the function designator (an expression) is evaluated to obtain a 
location having function type; this location is fed to the memory structure so as to 
obtain the function abstract. All the other computation steps, including parameter 
passing, are performed as before. On exit from the function call, the return value is 
stored at the location computed in the first step. Exceptions are eventually detected 
and propagated as usual. Also note that, thanks to the rules for type decay, arrays 
and functions can be passed to and returned from function calls. 
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(Multi- dimensional) Global array declaration. In the following rule schemata, let 
n > 0, aT = array mi of (. . . (array m n of sT) . . . ) and m = mi x . . . x m n . 

P^fi (e,<r) ->r, 
P h<3 (gvar id : aT = e, cr) — > clcanup d (e) 
if either 77 = e, or 77 = u and newarray d (m, u) = e; 

P 1-/3 (e,cr) -> u 
p 1-/3 (gvar id : aT = e, cr) -> (p , cr ) 
if newarray d (m, = (a ,l) and p = {id 1— ► (Z,aT)}. 

The rules for local array declaration are similar. Since function abstracts are now 
stored in memory structures, a few minor adaptations, omitted for space reasons, 
are also required for the rule of function declarations (which uses new t ) and the 
rules for recursive environments and declarations. 

9.3 Heap Memory Management 

By adding a heap segment to memory structures, as well as suitable helper functions 
(newh, deleteh and the corresponding array versions), it is possible to further extend 
the language to embrace dynamic memory allocation and deallocation. 

9.3.1 Syntax. We add an allocation expression and a deallocation statement: 

Exp 9 e \ new sT = e 

Stmt 3 s ::=... | delete e 

9.3.2 Static Semantics. 

/3h/e:sT (3 hj e : sT* 

(3 hj new sT = e : sT* (3 hj delete e 

9.3.3 Concrete Dynamic Semantics. This is extended with the schemata: 

New expression. 

p^p(e,a}^e p , r fj {e,a)^v 

if newh(w) = £ 

p \-p (new sT = e, a) — > e phfj (new sT = e, a) — > £ 

p r-0 (e, cr) — > ti 

if new h (t>) = (cr o ,0 

p (new sT = e, cr) — > (Z, cr ) 

Delete operator. 

p h/3 (e, cr) -> £ p h/3 (e, cr) — > ?j 

p h^j (delete e, cr) — > e p\-p (delete e, cr) — > dclctCh(w) 

Similar rules allow for allocation and deallocation of an array on the heap: note 
that, contrary to the previous cases, the dimensions of the array can be specified 
as expressions that will be evaluated dynamically. 

Regarding the abstract semantics, the extensions concerning C-like pointers and 
arrays as well as heap memory management can be obtained along the lines followed 
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in Section 6. In particular, the new memory structure operators described above 
are provided with safe approximations and a new abstract domain Loc" for location- 
valued expressions has to be defined. By generalizing the abstract memory read 
and update operators so as to take as input an abstract location, we realize the 
so-called weak read and weak update operators, so as to correctly deal with, e.g., 
assignments or function calls whose target is not statically known. In practice, no 
fundamentally new issue has to be solved as far as the specification of the abstract 
interpreter is concerned. This is not to say that these extensions are trivial; rather, 
the real issues (e.g., the efficient and accurate tracking of aliasing information for 
pointers [Ema93; EGH94] or the appropriate summarization techniques for large 
arrays [GRS05] and heap-allocated data [GDD+04; SRW02]) are orthogonal to the 
current approach and should be addressed elsewhere. 

9.4 Non-Structured Control Flow Mechanisms 

It turns out that the approach we have chosen to model exceptional behavior of 
programs can be easily generalized so as to capture all the non-structured control 
flow mechanisms of languages such as C and C++. To exemplify such a general- 
ization, the abstract syntax of commands is extended with branching and labeled 
statements: 

Label B I ::= id | m | default 

Stmt 3 s ::=... | goto id | switch e ins | break | continue | return e | I : s 

We assume that the static semantics ensures the labels used in a function body are 
all distinct (if the language supports local labels, then a trivial renaming will be 
required) and that every goto has access to a corresponding labeled statement, re- 
specting the constraints imposed by the language (concerning, for instance, jumping 
into and outside blocks). 

The state of a computation is captured, besides the current program point, by 
a control mode and a memory structure, which together constitute what we call 
a control state. A control state is classified by the corresponding control mode in 
either a plain execution state or an exception state; a plain execution state can be 
further distinguished in either a normal execution state, or a branching state, or a 
value state (for computations yielding a proper value), or an environment state (for 
computations yielding an execution environment). 

Definition 9.1. (GotoMode, SwitchMode, ValMode, EnvMode, ExceptMode, 
CtrlMode, CtrlState.) The sets of goto, switch, value, environment, exception 
and all control modes are given, respectively, by 

GotoMode d = { goto(id) | id G Id}, 

SwitchMode = f { switch(sval) | sval € sVal }, 

ValMode = f { value (sval) | sval e sVal }, 

EnvMode = f { env( ( o) | p e Env }, 

ExceptMode = f { except (£) | £ e Except }, 

CtrlMode = GotoMode l±l SwitchMode l+l ValMode +J EnvMode 
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l±J ExceptMode W {continue, break, return, exec}, 

where continue, break and return are the exit modes and exec is the plain execution 
mode. Control modes are denoted by cm, cmo, emi and so forth. 

def 

A control state is an element of CtrlState = CtrlMode x Mem. Control states 
are denoted by cs, cso, csi and so forth. 

The concrete semantics of the goto statement can now be expressed by 

p hp (goto id, (cm, a)) (cm , cr) 

if cm = exec and cmo = goto(id) or cm ^ exec and cmo = cm. 
The semantics of labeled statements is given by 

p hp (s, (cm , cr)) V 

php{l: s, (cm,cr)) -> rj 

where cmo = exec if cm = exec, or cm = goto(id) and I = id, or cm = switch(sval) 
and / £ {default, sval}; otherwise cmo = cm. 

Of course, the semantics of all statements must be suitably modified. For in- 
stance, the assignment should behave like a nop unless the control mode is the 
normal execution one. Statements with non trivial control flow need more work. 
For example, the semantics of the conditional statement can be captured by 14 

php (e, (exec.tr)) -► (cm , cr ) 

if cmo £ FxceptMode 

p \-p (if ethenso elsesi, (exec, cr)) — > (cm , cr ) 

p hp (e, (exec, cr)) -> (value(tt), cr ) p hp (s , (exec, a )) ->■ (emi, cti) 

p hp (si, (cmi,cri)) -> n (124) 

p hp (if e then so elsesi, (exec, a)) — > 77 
if emi e GotoMode; 

p 1—^3 (e, (exec, cr)) (value(tt), cr ) p hp (s , (exec, cr )) -*■ (emi, cti) 
p hp (if e then s elsesi, (exec, cr)) — ► (emi, a\) 
if emi ^ GotoMode; 

p (e, (exec, cr)) ->■ (value(ff), cr ) p hp (si, (exec, cr )) -> ry 
p (if e then s elses^ (exec, cr)) — ► 77 

P 1-/3 (s , (cm, cr)) (cm , cr ) 

p hp (if e then so elsesi, (cm, cr)) — ► (cmo, cro) 
if cm e GotoMode l±l SwitchMode and cm ^ GotoMode W SwitchMode; 
p 1-/3 (s , (cm, a)) -> (cmo, cr ) p 1-/3 (si, (cmo, <T )) ->■ ?? 
p (if e then s else si, (cm, cr)) — ► r\ 

14 Recall that, in C, it is perfectly legal to jump into the "else branch" from the "then branch." 
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if cm G GotoMode W SwitchMode and cm G GotoMode l+l SwitchMode; 



p hp (if ethensn elsesi, (cm, cr)) — ► (cm, cr) 

lotoMode W SwitchMode W {exe< 
ie, the semantics of the switch 

P r-/3 (e, (exec, cr)) -> (cm , cr ) 
p hp (switch e ins, (exec, cr)) — > (cm , cr ) 

p h/3 (e, (exec, cr)) — > ( value (svaln), en) 

p (s, (switch(svaln), cr )) — > (cmi,(Ti) 



if cm ^ GotoMode W SwitchMode W {exec}. 

Likewise, the semantics of the switch statement can be captured by: 

if cm G ExceptMode 



exec, if emi G SwitchMode W {break}, 
cnii, otherwise; 



php (switch e ins, (exec, cr)) — > (cm 2 ,cri) 
if cm 2 = 

p h/3 (s, (goto(id), cr)) -> (cm , ct ) 
p hp (switch e in s, (goto(id), cr)) — > (cmi,cro) 

(exec, if cm = break, 
if cm! = < 

lcm , otherwise; 

if cm ^ GotoMode HJ {exec}. 

p \-p (switch e in s, (cm, cr)) — > (cm, cr) 

While such a semantic treatment captures all forward jumps, for backward jumps 
something more is required. One simple possibility (which is not the only one) is 
to explicitly introduce a looping construct that is (only) available in the abstract 
syntax. That is, we extend Stmt once again as 

Stmt 9 s ::= . . . | loops 

and assume that a set of such loops has been inserted so that all backward jumps 
are enclosed in at least one loop (notice that at most one such loop per function 
body suffices, but more can be used as a matter of optimization). For s G Stmt, 
let SL(s) denote the set of statement labels in s. The concrete semantics of this 
looping construct is now given by 

p hp (s,cs) -> (cm, cr) 

if cm 7^ goto(id) for each id G SL(s) 



p \-p (loops,cs) — > (cm, cr) 
p\-p (s,cs) -> (goto(id), cr) ph p (loops, (goto(id), cr) ) -> 



if id G SL(s) 



p \-p (loop s, cs) — > J] 

Observe that the systematic use of the looping construct can make rule schema (124) 
redundant. 

Other rules are omitted for space reasons. However, there are no additional 
difficulties besides the ones just addressed: the rules for break and continue 
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are straightforward; return e can be modeled as the assignment to the reserved 
identifier x ( see concrete rule (67)), followed by the setting of the control mode; 
the rules for the while loop are a bit involved as they must support the 'break' and 
'continue' control modes in addition to 'goto' and 'switch'. 

The proposed approach handles non-structured control flow mechanisms essen- 
tially by adding a sort of control register to the rule-based interpreter of the lan- 
guage. As far as the abstract semantics is concerned, a first choice to be made 
concerns the approximation of the values that the control register can take. As 
usual, there is a complexity/precision trade-off to be faced: the simple solution 
is to approximate p(CtrlMode) by some (simple) abstract domain CtrlMode" and 
then approximate CtrlState = CtrlMode x Mem by CtrlMode" <g> Mem"; a more 
precise solution is to approximate p(CtrlState) by an abstract domain CtrlState" 
that captures relational information connecting the control modes to the memory 
structures they can be coupled with. The abstract rules schemata must of course 
be modified to match the concrete world. For instance, the abstract rule for the 
conditional statement becomes: 



P h f ( e > CS cond) -> CS P h /3 (SO, CS t S hcn ) -» C S { p hp CsjjjJ 



cs. 



where 



cs 



tt 

'cond 
CS thon = 
CS else = 



p hp (if e then so elsesi, cs") ~* CS3 

*e(p,CS»,tt), 

$ e (p, cs", e) U $ m (cs", GotoMode tt) SwitchMode), 

$ e (p, cs", not e) U $ m (csJ, GotoMode) U cs| ump , 

j±, if $ m (cs", GotoMode tt) SwitchMode) = !_, 

[$ m (cs" 1 ,C jump ), otherwise, 

GotoMode U { cm G CtrlMode | 3a G Mem . 7(cs") = (cm, a) }, 
$ m (cs 9 , CtrlMode \ ({exec} l±l GotoMode tt) SwitchMode)) 

U $ m (csl, CtrlMode \ ValMode) U cs} U cs^, 



cs 



jump 



C 



jump — 

cs! = 



and the two computable filter functions <I> : (Env x CtrlState" x Exp) — > CtrlState" 
and <i> m : (CtrlState" x p(CtrlMode)) —> CtrlState" are defined as follows, for each 
p e Env, cs" e CtrlState", e G Exp and C C CtrlMode such that, for some 
£ TEnv, : I with FI(e) C / and hj e : boolean: 



7($ e (p,cs",e)) D < 



3a G Mem . cs = (exec, a), 
cs G 7(cs") 3a' G Mem V 
. (p hp (e,cs) -> (value(tt),cr')) ^ 

7($ m (cs", C)) D { cs G 7(cs") | 3a G Mem . cs = (cm, a), cm G C}. 
10. CONCLUSION 

In this paper, we have confronted the problem of defining an analysis framework 
for the specification and realization of precise static analyzers for mainstream im- 
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perative programming languages, tools in very short supply that, however, ought 
to become part of the current programming practice. A proposal put forward by 
Schmidt twelve years ago [Sch95] held, in our eyes, considerable promise, despite 
the fact it had not been fully developed and applied in realistic contexts. It was 
therefore natural to question whether the promise could be fulfilled. To investi- 
gate Schmidt's approach, which is based on structured operational semantics and 
abstract interpretation, we have defined an imperative language, CPM, that em- 
bodies all the "problematic features" of single-threaded imperative languages now 
in widespread use. We have presented a concrete semantics of CPM that is suit- 
able for abstraction while retaining all the nice features of SOS descriptions. For 
a subset of the language we have formally defined an abstract semantics that can 
fully exploit the precision offered by relational abstract domains, and proved its 
soundness with respect to the concrete one. We have also shown how approxima- 
tions of the abstract semantics can be effectively computed. In order to provide 
an experimental evaluation of the ideas presented in this paper, both the concrete 
and the abstract semantics — instantiated over sophisticated numeric domains and 
together with a suitable fixpoint computation engine — have been incorporated into 
the ECLAIR system. This work allows us to conclude that the proposal of Schmidt 
can play a crucial role in the development of reliable and precise analyzers. The 
key features of this approach are: 

— a fairly concise concrete semantics that experts can easily read (and modify as 
needed) and that everyone can execute on non-trivial examples in order to check 
its agreement with the applicable language standards; 

— a fairly concise abstract semantics that is fully parametric with respect to the 
abstract domain, that is not difficult to prove correct with respect to the concrete 
one (to the point that automatizing the proof seems to be a reasonable goal), 
and that directly leads to the implementation of static analyzers. 

Of course, the story does not end here. For instance, our analysis framework is 
parametric on abstract memory structures. While the literature seems to provide all 
that is necessary to realize very sophisticated ones, it is not difficult to predict that, 
among all the code out there waiting to be analyzed, some will greatly exacerbate 
the complexity /precision trade-off. However, these are research problems for the 
future — now that we have, as given here, a formal design on which analyzers can 
be built, our next goal is to complete the build and make the technology described 
here truly available and deployable. 

ACKNOWLEDGMENTS 

Anna Dolma Alonso, Irene Bacchi, Danilo Bonardi, Andrea Cimino, Enrico Franchi, 
Davide Masi and Alcssandro Vincenzi (all students of the course on "Analysis and 
Verification of Software" taught by Roberto Bagnara at the University of Parma) 
and Vajirapan Panumong (University of Leeds) collaborated on previous, much 
more restricted versions of this work. We are also grateful to David Merchat (for- 
merly at the University of Parma) and Katy Dobson (University of Leeds) for the 
discussions we have had on the subject of this paper. 



70 • R. Bagnara, P.M. Hill, A. Pescetti, and E. Zaffanella 



REFERENCES 

B. Blanchct, P. Cousot, R. Cousot, J. Ferct, L. Mauborgne, A. Mine, D. Monniaux, and X. Rival, 
Design and implementation of a special-purpose static program analyzer for safety- critical real- 
time embedded software, The Essence of Computation, Complexity, Analysis, Transformation. 
Essays Dedicated to Neil D. Jones [on occasion of his 60th birthday] (T. M. Mogensen, D. A. 
Schmidt, and I. Hal Sudborough, eds.), Lecture Notes in Computer Science, vol. 2566, Springcr- 
Verlag, Berlin, 2002, pp. 85-108. 

, A static analyzer for large safety-critical software, Proceedings of the ACM SIGPLAN 

2003 Conference on Programming Language Design and Implementation (PLDI'03) (San Diego, 
California, USA), ACM Press, 2003, pp. 196-207. 

R. Bagnara, P. M. Hill, E. Ricci, and E. Zaffanella, Precise widening operators for convex poly- 
hedra, Science of Computer Programming 58 (2005), no. 1—2, 28—56. 

R. Bagnara, P. M. Hill, and E. Zaffanella, Not necessarily closed convex polyhedra and the double 
description method, Formal Aspects of Computing 17 (2005), no. 2, 222-257. 

, The Parma Polyhedra Library: Toward a complete set of numerical abstractions for 

the analysis and verification of hardware and software systems, Quadcrno 457, Diparti- 
mcnto di Matcmatica, Universita di Parma, Italy, 2006, Available at http://www.cs.unipr.it/ 
Publications/. Also published as arXiv: cs .MS/0612085, available from http://arxiv.org/. 

M. Bruynooghe, A practical framework for the abstract interpretations of logic programs, Journal 
of Logic Programming 10 (1991), 91-124. 

P. Cousot and R. Cousot, Static determination of dynamic properties of programs, Proceedings 
of the Second International Symposium on Programming (Paris, France) (B. Robinet, cd.), 
Dunod, Paris, France, 1976, pp. 106-130. 

, Abstract interpretation: A unified lattice model for static analysis of programs by con- 
struction or approximation of fixpoints, Proceedings of the Fourth Annual ACM Symposium 
on Principles of Programming Languages (New York), ACM Press, 1977, pp. 238—252. 

, Static determination of dynamic properties of recursive procedures, IFIP Conference 

on Formal Description of Programming Concepts (E. J. Ncuhold, cd.), North-Holland, 1977, 
pp. 237-277. 

, Systematic design of program analysis frameworks, Proceedings of the Sixth Annual 

ACM Symposium on Principles of Programming Languages (New York), ACM Press, 1979, 
pp. 269-282. 

, Abstract interpretation frameworks, Journal of Logic and Computation 2 (1992), no. 4, 

511-547. 

, Comparing the Galois connection and widening/narrowing approaches to abstract inter- 
pretation, Proceedings of the 4th International Symposium on Programming Language Imple- 
mentation and Logic Programming (Lcuvcn, Belgium) (M. Bruynooghe and M. Wirsing, eds.), 
Lecture Notes in Computer Science, vol. 631, Springer- Verlag, Berlin, 1992, pp. 269-295. 

, Inductive definitions, semantics and abstract interpretation, Proceedings of the Nine- 
teenth Annual ACM Symposium on Principles of Programming Languages (Albuquerque, New 
Mexico, USA), ACM Press, 1992, pp. 83-94. 

, Higher-order abstract interpretation (and application to comportment analysis general- 
izing strictness, termination, projection and PER analysis of functional languages), Proceed- 
ings of the IEEE Computer Society 1994 International Conference on Computer Languages 
(Toulouse, France) (H. E. Bal, ed.), IEEE Computer Society Press, 1994, Invited paper, pp. 95- 
112. 

P. Cousot and N. Halbwachs, Automatic discovery of linear restraints among variables of a pro- 
gram, Conference Record of the Fifth Annual ACM Symposium on Principles of Programming 
Languages (Tucson, Arizona), ACM Press, 1978, pp. 84-96. 

P. Cousot, Semantic foundations of program analysis, Program Flow Analysis: Theory and Ap- 
plications (S. S. Muchnick and N. D. Jones, eds.), Prentice Hall, Englcwood Cliffs, NJ, USA, 
1981, pp. 303-342. 

, The calculational design of a generic abstract interpreter, Calculational System Design 

(M. Broy and R. Steinbriiggen, eds.), NATO ASI Series F. IOS Press, Amsterdam, NL, 1999. 



On the Design of Generic Static Analyzers for Modern Imperative Languages • 71 



, The verification grand challenge and abstract interpretation, Verified Software: Theories, 

Tools, Experiments (VSTTE) (ETH Zurich, Switzerland), 2005, Position paper. 

N. Dor, M. Rodeh, and S. Sagiv, Cleanness checking of string manipulations in C programs 
via integer analysis, Static Analysis: 8th International Symposium, SAS 2001 (Paris, France) 
(P. Cousot, ed.), Lecture Notes in Computer Science, vol. 2126, Springer- Vcrlag, Berlin, 2001, 
pp. 194-212. 

M. Emami, R. Ghiya, and L. J. Hendren, Context-sensitive interprocedural points-to analysis in 
the presence of function pointers, Proceedings of the ACM SIGPLAN'94 Conference on Pro- 
gramming Language Design and Implementation (Orlando, Florida), vol. 29, ACM SIGPLAN 
Notices, no. 6, Association for Computing Machinery, 1994, pp. 242—256. 

M. Emami, A practical inter- procedural alias analysis for an optimizing/paralleling C compiler, 
Master's thesis, School of Computer Science, McGill University, Montreal, Canada, August 
1993. 

D. Gopan, F. DiMaio, N. Dor, T. Reps, and M. Sagiv, Numeric domains with summarized dimen- 
sions, Tools and Algorithms for the Construction and Analysis of Systems, 10th International 
Conference, TACAS 2004 (Barcelona, Spain) (K. Jensen and A. Podelski, eds.), Lecture Notes 
in Computer Science, vol. 2988, Springer- Vcrlag, Berlin, 2004, pp. 512-529. 

R. Giacobazzi, S. K. Debray, and G. Levi, A generalized semantics for constraint logic programs, 
Proceedings of the International Conference on Fifth Generation Computer Systems (FGCS'92) 
(Tokyo, Japan), ICOT, 1992, pp. 581-591. 

D. Gopan, T. W. Reps, and M. Sagiv, A framework for numeric analysis of array operations, 
Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming 
Languages (Long Beach, California, USA), 2005, pp. 338-350. 

N. Halbwachs, Delay analysis in synchronous programs, Computer Aided Verification: Proceedings 
of the 5th International Conference (Elounda, Greece) (C. Courcoubetis, ed.), Lecture Notes 
in Computer Science, vol. 697, Springer- Verlag, Berlin, 1993, pp. 333-346. 

C. A. R. Hoare, The verifying compiler: A grand challenge for computing research, Journal of the 
ACM 50 (2003), no. 1, 63-69. 

N. Halbwachs, Y.-E. Proy, and P. Roumanoff, Verification of real-time systems using linear rela- 
tion analysis, Formal Methods in System Design 11 (1997), no. 2, 157-185. 

B. Jeannet and W. Serwe, Abstracting call-stacks for interprocedural verification of imperative 
programs, Publication interne 1543, IRISA, Campus de Beaulieu, Rennes, France, 2003. 

, Abstracting call-stacks for interprocedural verification of imperative programs, Proceed- 
ings of the 10th International Conference on Algebraic Methodology and Software Technology 
(Stirling, Scotland, UK) (C. Rattray, S. Maharaj, and C. Shankland, eds.), Lecture Notes in 
Computer Science, vol. 3116, Springer- Vcrlag, Berlin, 2004, pp. 258-273. 

G. Kahn, Natural semantics, Proceedings of the 4th Annual Symposium on Theoretical Aspects of 
Computer Science (Passau, Germany) (F.-J. Brandenburg, G. Vidal-Naquet, and M. Wirsing, 
eds.), Lecture Notes in Computer Science, vol. 247, Springer- Verlag, Berlin, 1987, pp. 22-39. 

X. Leroy, Coinductive big-step operational semantics, Programming Languages and Systems, Pro- 
ceedings of the 14th European Symposium on Programming (Vienna, Austria) (P. Sestoft, ed.), 
Lecture Notes in Computer Science, vol. 3924, Springer- Vcrlag, Berlin, 2006, pp. 54-68. 

A. Mine, Field-sensitive value analysis of embedded C programs with union types and pointer 
arithmetics, Proceedings of the 2006 ACM SIGPLAN/SIGBED Conference on Languages, Com- 
pilers, and Tools for Embedded Systems (Ottawa, Ontario, Canada) (M. J. Irwin and K. Dc 
Bosschere, eds.), ACM Press, 2006, pp. 54-63. 

G. C. Nccula, S. McPeak, S. P. Rahul, and W. Weimer, CIL: Intermediate language and tools for 
analysis and transformation of C programs, Compiler Construction: Proceedings of the 11th 
International Conference (CC 2002) (Grenoble, France) (R. N. Horspool, ed.), Lecture Notes 
in Computer Science, vol. 2304, Springer- Verlag, Berlin, 2002, pp. 213-228. 

G. D. Plotkin, A structural approach to operational semantics, Journal of Logic and Algebraic 
Programming 60-61 (2004), 17-139. 



72 • R. Bagnara, P.M. Hill, A. Pescetti, and E. Zaffanella 



D. A. Schmidt, Natural-semantics-based abstract interpretation (preliminary version), Static 
Analysis: Proceedings of the 2nd International Symposium (Glasgow, UK) (A. Mycroft, ed.), 
Lecture Notes in Computer Science, vol. 983, Springer- Verlag, Berlin, 1995, pp. 1-18. 

, Abstract interpretation of small-step semantics, Analysis and Verification of Multiple- 
Agent Languages (M. Dam, ed.), Lecture Notes in Computer Science, vol. 1192, Springer- Verlag, 
Berlin, 1997, 5th LOMAPS Workshop Stockholm, Sweden, June 24-26, 1996, Selected Papers, 
pp. 76-99. 

, Trace-based abstract interpretation of operational semantics, LISP and Symbolic Com- 
putation 10 (1998), no. 3, 237-271. 

R. Shaham, E. K. Kolodner, and S. Sagiv, Automatic removal of array memory leaks in Java, 
Proceedings of the 9th International Conference on Compiler Construction (CC 2000) (Berlin, 
Germany) (D. A. Watt, ed.), Lecture Notes in Computer Science, vol. 1781, Springer- Verlag, 
Berlin, 2000, pp. 50-66. 

M. Sharir and A. Pnueli, Two approaches to interprocedural data flow analysis, Program Flow 
Analysis: Theory and Applications (S. S. Muchnick and N. D. Jones, eds.), Prentice Hall, 
Englewood Cliffs, NJ, USA, 1981, pp. 189-233. 

S. Sagiv, T. W. Reps, and R. Wilhelm, Parametric shape analysis via 3-valued logic, ACM Trans- 
actions on Programming Languages and Systems 24 (2002), no. 3, 217-298. 



